Convert MediaWiki to XML

Drag and drop files here or click to select.
Max file size 100mb.
Uploading progress:

MediaWiki vs XML Format Comparison

Aspect MediaWiki (Source Format) XML (Target Format)
Format Overview
MediaWiki
MediaWiki Markup Language

Lightweight markup language created for Wikipedia in 2002 and used by all MediaWiki-powered wikis. Uses distinctive syntax with == headings ==, '''bold''', ''italic'', [[links]], and {| tables |} for collaborative web content creation and editing.

Wiki Markup Plain Text
XML
Extensible Markup Language

Versatile, self-describing markup language designed by the W3C for storing and transporting structured data. Uses hierarchical tag-based syntax with custom elements and attributes. The foundation of countless data formats including XHTML, SVG, RSS, SOAP, and Office Open XML. Supports schemas (XSD), stylesheets (XSLT), and namespaces.

Structured Data W3C Standard
Technical Specifications
Structure: Plain text with wiki markup
Encoding: UTF-8
Format: Text-based markup language
Compression: None (plain text)
Extensions: .mediawiki, .wiki, .txt
Structure: Hierarchical tree of elements
Encoding: UTF-8 (default), UTF-16
Format: W3C standard markup language
Compression: None (can be compressed externally)
Extensions: .xml
Syntax Examples

MediaWiki uses wiki-style markup:

== Section Heading ==
'''Bold text''' and ''italic''
* Bullet list item
# Numbered list item
[[Internal Link]]
{{Template:Infobox}}

XML uses hierarchical tags:

<?xml version="1.0" encoding="UTF-8"?>
<document>
  <section title="Section Heading">
    <paragraph>
      <bold>Bold text</bold> and
      <italic>italic</italic>
    </paragraph>
  </section>
</document>
Content Support
  • Section headings (levels 1-6)
  • Bold, italic, underline formatting
  • Bulleted and numbered lists
  • Wiki-style tables
  • Internal and external links
  • Image embedding via file references
  • Categories and templates
  • Table of contents (auto-generated)
  • References and citations
  • Infoboxes and navboxes
  • Custom element definitions
  • Attributes on any element
  • Unlimited nesting depth
  • Mixed content (text + elements)
  • Namespaces for disambiguation
  • CDATA sections for raw content
  • Processing instructions
  • Schema validation (XSD, DTD)
  • XSLT transformation support
  • XPath query support
Advantages
  • Powers Wikipedia and thousands of wikis
  • Built-in linking and categorization
  • Collaborative editing support
  • Auto-generated table of contents
  • Template and transclusion system
  • Version history tracking
  • Universal data exchange format
  • Self-describing structure
  • Schema validation capability
  • XSLT for format transformation
  • Supported by every programming language
  • Human and machine readable
Disadvantages
  • Complex table syntax
  • Requires MediaWiki software to render
  • Not widely used outside wikis
  • Template syntax can be confusing
  • No native print layout support
  • Verbose compared to JSON or YAML
  • Larger file sizes due to closing tags
  • Complex schema definitions
  • Slower to parse than binary formats
  • Overkill for simple data structures
Common Uses
  • Wikipedia articles and pages
  • Corporate wikis and knowledge bases
  • Technical documentation wikis
  • Community-driven encyclopedias
  • Open-source project documentation
  • Web services and APIs (SOAP, REST)
  • Configuration files (Maven, Spring)
  • Data interchange between systems
  • Document formats (OOXML, ODF)
  • RSS/Atom feeds
  • SVG graphics
Best For
  • Wiki-based content publishing
  • Collaborative documentation
  • Knowledge base articles
  • Wikipedia contributions
  • Enterprise data exchange
  • Validated structured documents
  • Cross-system interoperability
  • Complex hierarchical data
Version History
Introduced: 2002 (MediaWiki 1.0)
Current Version: MediaWiki 1.42 (2024)
Status: Actively maintained and developed
Evolution: Regular updates with new features
Introduced: 1998 (W3C Recommendation)
Current Version: XML 1.0 Fifth Edition (2008)
Status: Stable W3C standard
Evolution: XML 1.1 available; 1.0 remains dominant
Software Support
MediaWiki: Native rendering engine
Wikipedia: Primary content format
Pandoc: Full conversion support
Other: Any text editor for source editing
All Languages: Built-in XML parsers
Browsers: Native XML rendering
Tools: XMLSpy, Oxygen XML Editor
Other: lxml, ElementTree, DOM, SAX

Why Convert MediaWiki to XML?

Converting MediaWiki markup to XML creates a well-structured, machine-readable representation of wiki content that can be processed by virtually any software system. XML's hierarchical structure naturally maps to the document structure of wiki pages: sections become nested elements, paragraphs become child elements, and metadata like categories and links become attributes or separate elements. This structured output enables automated processing, search indexing, and data pipeline integration.

MediaWiki itself uses XML for its Special:Export feature, producing XML dumps of wiki content. However, the native MediaWiki XML export format embeds the raw wiki markup within XML tags. Converting MediaWiki markup to a properly structured XML document goes further by parsing the wiki syntax and representing each content element (headings, paragraphs, lists, tables, links) as semantic XML elements, making the content truly machine-parseable without any wiki-syntax knowledge.

This conversion is essential for content management systems that consume XML, XSLT-based publishing pipelines, enterprise search engines, and data warehousing systems. Organizations migrating from MediaWiki to XML-based content management systems (like DITA or DocBook) benefit from having their wiki content in a structured XML format that can be further transformed using XSLT stylesheets into any target format.

The conversion produces well-formed XML with a logical document structure. Wiki headings become section elements with level attributes, formatted text uses inline elements (bold, italic), lists become ordered/unordered list structures, tables become properly nested row/cell elements, and links include both the reference URL and display text as separate attributes. The output can optionally include an XML Schema (XSD) for validation.

Key Benefits of Converting MediaWiki to XML:

  • Machine Readable: Fully parseable by XML libraries in any programming language
  • XSLT Transformation: Apply stylesheets to produce HTML, PDF, or any output format
  • Schema Validation: Validate content structure with XSD or DTD
  • Enterprise Integration: Feed wiki content into CMS, search, and data systems
  • XPath Queries: Navigate and extract specific content using XPath expressions
  • Content Pipeline: Use as input for automated publishing and documentation workflows
  • Interoperability: XML works with virtually every enterprise system and tool

Practical Examples

Example 1: Article Structure to XML

Input MediaWiki file (article.mediawiki):

= Python Programming Language =

== Overview ==
'''Python''' is a [[high-level programming language]]
known for its ''readability'' and versatility.

== Features ==
* Dynamic typing
* Garbage collection
* Multi-paradigm support

[[Category:Programming Languages]]
[[Category:Scripting Languages]]

Output XML file (article.xml):

<?xml version="1.0" encoding="UTF-8"?>
<document title="Python Programming Language">
  <section level="2" title="Overview">
    <paragraph>
      <bold>Python</bold> is a
      <link target="high-level programming language"/>
      known for its <italic>readability</italic>
      and versatility.
    </paragraph>
  </section>
  <section level="2" title="Features">
    <list type="unordered">
      <item>Dynamic typing</item>
      <item>Garbage collection</item>
      <item>Multi-paradigm support</item>
    </list>
  </section>
  <categories>
    <category>Programming Languages</category>
    <category>Scripting Languages</category>
  </categories>
</document>

Example 2: Table Data to XML

Input MediaWiki file (servers.mediawiki):

== Server Infrastructure ==

{| class="wikitable"
|-
! Hostname !! IP Address !! Role !! Status
|-
| web-01 || 10.0.1.10 || Web Server || Active
|-
| db-01 || 10.0.2.10 || Database || Active
|-
| cache-01 || 10.0.3.10 || Cache || Standby
|}

Output XML file (servers.xml):

<?xml version="1.0" encoding="UTF-8"?>
<document>
  <section title="Server Infrastructure">
    <table>
      <headers>
        <header>Hostname</header>
        <header>IP Address</header>
        <header>Role</header>
        <header>Status</header>
      </headers>
      <row>
        <cell>web-01</cell>
        <cell>10.0.1.10</cell>
        <cell>Web Server</cell>
        <cell>Active</cell>
      </row>
      <row>
        <cell>db-01</cell>
        <cell>10.0.2.10</cell>
        <cell>Database</cell>
        <cell>Active</cell>
      </row>
      <row>
        <cell>cache-01</cell>
        <cell>10.0.3.10</cell>
        <cell>Cache</cell>
        <cell>Standby</cell>
      </row>
    </table>
  </section>
</document>

Example 3: Complex Content to XML

Input MediaWiki file (release_notes.mediawiki):

== Release Notes v3.0 ==

=== New Features ===
# User authentication via [[OAuth 2.0]]
# '''Real-time notifications''' system
# Improved [[search|full-text search]]

=== Bug Fixes ===
* Fixed memory leak in cache module
* Resolved {{Bug|1234}} - login timeout

{{Note|Upgrade requires database migration.}}

Output XML file (release_notes.xml):

<?xml version="1.0" encoding="UTF-8"?>
<document>
  <section level="2" title="Release Notes v3.0">
    <section level="3" title="New Features">
      <list type="ordered">
        <item>User authentication via
          <link target="OAuth 2.0"/></item>
        <item><bold>Real-time notifications</bold>
          system</item>
        <item>Improved <link target="search"
          display="full-text search"/></item>
      </list>
    </section>
    <section level="3" title="Bug Fixes">
      <list type="unordered">
        <item>Fixed memory leak in cache module</item>
        <item>Resolved Bug #1234 - login timeout</item>
      </list>
    </section>
    <note>Upgrade requires database migration.</note>
  </section>
</document>

Frequently Asked Questions (FAQ)

Q: What is XML format?

A: XML (Extensible Markup Language) is a W3C standard for encoding structured data in a human-readable and machine-readable text format. It uses a hierarchical tree of elements defined by opening and closing tags with custom names. XML is the foundation of many data formats (RSS, SVG, SOAP, OOXML) and is supported by every major programming language and platform.

Q: How is MediaWiki content mapped to XML structure?

A: The wiki document structure maps naturally to XML's hierarchy. The page becomes the root document element, sections become nested section elements with level attributes, paragraphs become paragraph elements, lists become ordered/unordered list structures, tables become table/row/cell hierarchies, and formatting (bold, italic) uses inline elements. Links and categories become elements with attributes.

Q: Is the output well-formed XML?

A: Yes! The converter produces well-formed XML that complies with the XML 1.0 specification. This includes proper XML declaration, correctly nested elements, properly escaped special characters (&, <, >, "), and UTF-8 encoding. The output can be validated by any XML parser without errors.

Q: Can I transform the XML with XSLT?

A: Absolutely! The structured XML output is specifically designed to be XSLT-friendly. You can apply XSLT stylesheets to transform the wiki content into HTML pages, PDF documents (via XSL-FO), DITA topics, DocBook documents, or any other format. This makes the conversion a powerful first step in any content transformation pipeline.

Q: How are MediaWiki templates represented in XML?

A: MediaWiki templates are converted to their text content or represented as dedicated XML elements. For example, a Note template becomes a <note> element, a Bug template becomes a reference with the bug number as an attribute. Complex templates like infoboxes are either expanded to their visible content or structured as metadata elements within the XML tree.

Q: Can I query the XML output with XPath?

A: Yes! The hierarchical XML structure supports XPath queries for extracting specific content. For example, //section[@title='Features'] finds all Features sections, //link/@target extracts all link destinations, and //table/row[1]/cell returns all first-row cells. This makes programmatic content extraction straightforward.

Q: Is this the same as MediaWiki's XML export?

A: No. MediaWiki's Special:Export produces XML that wraps the raw wiki markup in XML tags (the wiki text is stored as-is within a <text> element). Our conversion actually parses the wiki markup and produces semantically structured XML where each content element (heading, list, table, link) has its own proper XML representation. This produces a far more useful XML document for data processing.

Q: Can I use the XML for DocBook or DITA conversion?

A: The structured XML output serves as an excellent intermediate format for DocBook or DITA conversion. Since the content is already parsed into semantic elements, an XSLT stylesheet can map the elements to DocBook or DITA equivalents. Sections become chapters/topics, lists map directly, and tables translate to their respective DocBook/DITA table models.

Q: Can I convert multiple MediaWiki files to XML at once?

A: Yes! Upload multiple MediaWiki files simultaneously and each will be independently converted to a well-formed XML document. This is ideal for batch-processing wiki dumps, migrating entire wiki sections to XML-based systems, or building XML content repositories from wiki sources.