Convert MediaWiki to JSON

Drag and drop files here or click to select.
Max file size 100mb.
Uploading progress:

MediaWiki vs JSON Format Comparison

Aspect MediaWiki (Source Format) JSON (Target Format)
Format Overview
MediaWiki
Wiki Markup Language

Lightweight markup language created by Magnus Manske and Lee Daniel Crocker for Wikipedia in 2002. Uses concise wiki syntax for headings, formatting, links, templates, and tables. Powers Wikipedia, Wiktionary, Wikimedia Commons, Fandom, and thousands of wikis across the internet.

Wiki Markup Wikipedia Standard
JSON
JavaScript Object Notation

Lightweight data interchange format derived from JavaScript object literals. Designed by Douglas Crockford in the early 2000s and standardized as ECMA-404 and RFC 8259. JSON has become the dominant format for web APIs, configuration files, and data exchange between services. Human-readable yet easily parsed by machines.

Data Interchange API Standard
Technical Specifications
Structure: Plain text with wiki markup tags
Encoding: UTF-8
Format: Text-based markup language
Compression: None (plain text)
Extensions: .mediawiki, .wiki, .txt
Structure: Nested objects, arrays, values
Encoding: UTF-8 (required by RFC 8259)
Format: ECMA-404 / RFC 8259 standard
Compression: None (gzip commonly applied)
Extensions: .json
Syntax Examples

MediaWiki uses wiki markup syntax:

== Article Title ==
'''Author''': John Smith

=== Summary ===
A brief ''introduction'' to the topic.

* Point one
* Point two

{| class="wikitable"
|-
! Key !! Value
|-
| status || active
|}

JSON uses structured key-value data:

{
  "title": "Article Title",
  "author": "John Smith",
  "sections": [{
    "heading": "Summary",
    "content": "A brief introduction.",
    "items": [
      "Point one",
      "Point two"
    ]
  }],
  "data": {
    "status": "active"
  }
}
Content Support
  • Headings (== to ======)
  • Bold, italic, underline
  • Internal and external links
  • Templates and transclusions
  • Wiki tables
  • Ordered and unordered lists
  • Categories and namespaces
  • Images and media
  • Mathematical formulas
  • References and citations
  • Objects (key-value pairs)
  • Arrays (ordered lists)
  • Strings (text values)
  • Numbers (integer and float)
  • Booleans (true/false)
  • Null values
  • Nested structures (unlimited depth)
  • Unicode support (full UTF-8)
  • Schema validation (JSON Schema)
Advantages
  • Easy to learn and write
  • Proven at Wikipedia scale
  • Collaborative editing support
  • Version history tracking
  • Powerful template system
  • Human-readable source
  • Universal API standard
  • Native JavaScript support
  • Strongly typed data
  • Deep nesting capability
  • Schema validation available
  • Every programming language supported
  • Compact and efficient
Disadvantages
  • Not machine-parseable for data
  • Requires wiki engine to render
  • No structured data export
  • Template dependencies
  • No native API integration
  • No comments allowed
  • Strict syntax (trailing commas fail)
  • No date type natively
  • Verbose for large datasets
  • No schema enforcement by default
  • Not ideal for document content
Common Uses
  • Wikipedia articles
  • Enterprise knowledge bases
  • Technical documentation
  • Fan and community wikis
  • Educational content
  • REST API data exchange
  • Web application configuration
  • Database document storage (MongoDB)
  • Package metadata (package.json)
  • Cross-service communication
  • Mobile app data feeds
Best For
  • Collaborative wiki editing
  • Encyclopedia-style content
  • Web-based documentation
  • Community knowledge bases
  • API data exchange
  • Structured data storage
  • Web application configuration
  • Cross-platform data transfer
Version History
Introduced: 2002 (Wikipedia/MediaWiki)
Current Version: MediaWiki 1.42 (2024)
Status: Actively developed
Evolution: Continuous updates since 2002
Introduced: 2001 (Douglas Crockford)
Current Version: ECMA-404 / RFC 8259
Status: International standard
Evolution: Minimal changes (stable spec)
Software Support
MediaWiki: Native support
Pandoc: Full read/write support
Editors: Any text editor
Other: Wikipedia, Fandom, wiki engines
JavaScript: Native JSON.parse/stringify
Python: json module (stdlib)
Databases: MongoDB, PostgreSQL, MySQL
Other: Every modern language and framework

Why Convert MediaWiki to JSON?

Converting MediaWiki markup to JSON transforms human-readable wiki content into structured, machine-processable data that can be consumed by APIs, stored in databases, and integrated into modern web applications. JSON is the universal data interchange format of the web, and converting wiki content to JSON unlocks programmatic access to knowledge that was previously locked in wiki markup syntax.

The conversion intelligently maps MediaWiki's document structure to JSON's hierarchical format. Wiki headings become nested object keys, paragraph text becomes string values, lists are converted to arrays, tables are transformed into arrays of objects, and template parameters become key-value pairs within objects. The resulting JSON preserves the semantic structure and relationships present in the original wiki content.

This conversion is essential for building applications that leverage wiki-based knowledge. Wikipedia and enterprise wikis contain vast amounts of structured information in infoboxes, tables, and categorized articles. Converting this content to JSON makes it accessible to search engines, recommendation systems, chatbots, data analysis tools, and any application that needs structured access to wiki knowledge.

For developers building content management systems, documentation APIs, or knowledge graph applications, MediaWiki-to-JSON conversion provides the data foundation. The JSON output can be loaded directly into MongoDB or other document databases, served through REST APIs, consumed by React or Vue frontend applications, and processed by data pipelines. This bridges the gap between collaborative wiki authoring and modern software architecture.

Key Benefits of Converting MediaWiki to JSON:

  • API Ready: JSON is the native format for REST and GraphQL APIs
  • Structured Data: Wiki content becomes queryable objects and arrays
  • Database Storage: Direct import into MongoDB, CouchDB, PostgreSQL JSONB
  • Universal Parsing: Every programming language has JSON support
  • Schema Validation: Validate output structure with JSON Schema
  • Web Integration: Native consumption by JavaScript applications
  • Data Pipeline: Feed wiki content into analytics and ML workflows

Practical Examples

Example 1: Wiki Article to JSON Object

Input MediaWiki file (planet.mediawiki):

== Mars ==
'''Mars''' is the fourth planet from the
[[Sun]].

=== Physical Properties ===
* '''Diameter''': 6,779 km
* '''Mass''': 6.39 x 10^23 kg
* '''Moons''': 2 ([[Phobos]], [[Deimos]])

[[Category:Planets]]
[[Category:Terrestrial planets]]

Output JSON file (planet.json):

{
  "title": "Mars",
  "description": "Mars is the fourth planet
    from the Sun.",
  "sections": [{
    "heading": "Physical Properties",
    "properties": {
      "diameter": "6,779 km",
      "mass": "6.39 x 10^23 kg",
      "moons": ["Phobos", "Deimos"]
    }
  }],
  "categories": ["Planets",
    "Terrestrial planets"]
}

Example 2: Wiki Table to JSON Array

Input MediaWiki file (employees.mediawiki):

== Team Members ==

{| class="wikitable"
|-
! Name !! Role !! Department
|-
| Alice Johnson || Lead Developer || Engineering
|-
| Bob Smith || Designer || UX Design
|-
| Carol White || PM || Operations
|}

Output JSON file (employees.json):

{
  "title": "Team Members",
  "data": [
    {
      "name": "Alice Johnson",
      "role": "Lead Developer",
      "department": "Engineering"
    },
    {
      "name": "Bob Smith",
      "role": "Designer",
      "department": "UX Design"
    },
    {
      "name": "Carol White",
      "role": "PM",
      "department": "Operations"
    }
  ]
}

Example 3: Infobox Template to JSON

Input MediaWiki file (city.mediawiki):

{{Infobox city
| name = Tokyo
| country = Japan
| population = 13960000
| area_km2 = 2194
| timezone = JST (UTC+9)
| mayor = Yuriko Koike
}}

== Overview ==
'''Tokyo''' is the capital of [[Japan]].

Output JSON file (city.json):

{
  "infobox": {
    "type": "city",
    "name": "Tokyo",
    "country": "Japan",
    "population": 13960000,
    "area_km2": 2194,
    "timezone": "JST (UTC+9)",
    "mayor": "Yuriko Koike"
  },
  "sections": [{
    "heading": "Overview",
    "content": "Tokyo is the capital
      of Japan."
  }]
}

Frequently Asked Questions (FAQ)

Q: How is the wiki structure mapped to JSON?

A: The conversion creates a hierarchical JSON object that mirrors the wiki document structure. The page title becomes the root "title" field, headings become keys in a "sections" array, paragraphs become "content" string values, lists become JSON arrays, tables become arrays of objects (using header cells as keys), and template parameters become key-value pairs in nested objects.

Q: Are wiki infobox templates converted to structured JSON?

A: Yes! Infobox templates are one of the best-suited wiki elements for JSON conversion. Each named parameter in the infobox becomes a key-value pair in a JSON object. Numeric values are converted to JSON numbers, boolean-like values become true/false, and list values become arrays. The template name is stored as a "type" field, enabling type-based processing in your application.

Q: Can I use the JSON output with a REST API?

A: Absolutely. The JSON output is ready for direct use with REST APIs. You can serve it through Express.js, Flask, Django REST Framework, or any API framework. The structured format with typed values (strings, numbers, arrays, objects) maps naturally to API response schemas. You can also validate the output against a JSON Schema to ensure consistent API responses.

Q: How does the conversion handle wiki links in JSON?

A: Internal wiki links ([[Page Name|Display Text]]) are converted to JSON objects with "target" and "text" fields, or simplified to plain text strings depending on the conversion mode. External links are preserved with their URLs. This allows applications to reconstruct hyperlinks or process link relationships programmatically. Link targets can be used to build knowledge graphs or navigation structures.

Q: Can the JSON output be imported into MongoDB?

A: Yes, the JSON output is directly compatible with MongoDB's document format. You can use mongoimport to load the JSON files, or insert them programmatically using any MongoDB driver. Each wiki page becomes a document in your collection, with the nested structure preserved as embedded documents and arrays. MongoDB's flexible schema handles varying wiki page structures seamlessly.

Q: What happens to wiki categories in JSON?

A: Wiki categories ([[Category:Name]]) are extracted and stored in a "categories" array at the root level of the JSON document. This makes it easy to filter, sort, and group wiki content by category in your application. Sub-categories can be represented as nested arrays or with parent-child relationships, depending on the depth of category information available.

Q: Is the JSON output minified or pretty-printed?

A: The default output is pretty-printed with indentation for human readability. This makes it easy to inspect and debug the converted content. For production use where file size matters, you can minify the JSON by removing whitespace. Most JSON libraries have both formatting options: Python's json.dumps with indent=2 for pretty-printing or separators=(',',':') for minification.

Q: How are wiki references and citations handled?

A: References (<ref> tags) are extracted and stored in a "references" array. Each reference becomes a JSON object with fields for the reference text, URL (if present), author, title, and other citation metadata. This structured approach makes it easy to build bibliography databases, citation indices, and reference management features from wiki content.