Convert MediaWiki to JSON
Max file size 100mb.
MediaWiki vs JSON Format Comparison
| Aspect | MediaWiki (Source Format) | JSON (Target Format) |
|---|---|---|
| Format Overview |
MediaWiki
Wiki Markup Language
Lightweight markup language created by Magnus Manske and Lee Daniel Crocker for Wikipedia in 2002. Uses concise wiki syntax for headings, formatting, links, templates, and tables. Powers Wikipedia, Wiktionary, Wikimedia Commons, Fandom, and thousands of wikis across the internet. Wiki Markup Wikipedia Standard |
JSON
JavaScript Object Notation
Lightweight data interchange format derived from JavaScript object literals. Designed by Douglas Crockford in the early 2000s and standardized as ECMA-404 and RFC 8259. JSON has become the dominant format for web APIs, configuration files, and data exchange between services. Human-readable yet easily parsed by machines. Data Interchange API Standard |
| Technical Specifications |
Structure: Plain text with wiki markup tags
Encoding: UTF-8 Format: Text-based markup language Compression: None (plain text) Extensions: .mediawiki, .wiki, .txt |
Structure: Nested objects, arrays, values
Encoding: UTF-8 (required by RFC 8259) Format: ECMA-404 / RFC 8259 standard Compression: None (gzip commonly applied) Extensions: .json |
| Syntax Examples |
MediaWiki uses wiki markup syntax: == Article Title ==
'''Author''': John Smith
=== Summary ===
A brief ''introduction'' to the topic.
* Point one
* Point two
{| class="wikitable"
|-
! Key !! Value
|-
| status || active
|}
|
JSON uses structured key-value data: {
"title": "Article Title",
"author": "John Smith",
"sections": [{
"heading": "Summary",
"content": "A brief introduction.",
"items": [
"Point one",
"Point two"
]
}],
"data": {
"status": "active"
}
}
|
| Content Support |
|
|
| Advantages |
|
|
| Disadvantages |
|
|
| Common Uses |
|
|
| Best For |
|
|
| Version History |
Introduced: 2002 (Wikipedia/MediaWiki)
Current Version: MediaWiki 1.42 (2024) Status: Actively developed Evolution: Continuous updates since 2002 |
Introduced: 2001 (Douglas Crockford)
Current Version: ECMA-404 / RFC 8259 Status: International standard Evolution: Minimal changes (stable spec) |
| Software Support |
MediaWiki: Native support
Pandoc: Full read/write support Editors: Any text editor Other: Wikipedia, Fandom, wiki engines |
JavaScript: Native JSON.parse/stringify
Python: json module (stdlib) Databases: MongoDB, PostgreSQL, MySQL Other: Every modern language and framework |
Why Convert MediaWiki to JSON?
Converting MediaWiki markup to JSON transforms human-readable wiki content into structured, machine-processable data that can be consumed by APIs, stored in databases, and integrated into modern web applications. JSON is the universal data interchange format of the web, and converting wiki content to JSON unlocks programmatic access to knowledge that was previously locked in wiki markup syntax.
The conversion intelligently maps MediaWiki's document structure to JSON's hierarchical format. Wiki headings become nested object keys, paragraph text becomes string values, lists are converted to arrays, tables are transformed into arrays of objects, and template parameters become key-value pairs within objects. The resulting JSON preserves the semantic structure and relationships present in the original wiki content.
This conversion is essential for building applications that leverage wiki-based knowledge. Wikipedia and enterprise wikis contain vast amounts of structured information in infoboxes, tables, and categorized articles. Converting this content to JSON makes it accessible to search engines, recommendation systems, chatbots, data analysis tools, and any application that needs structured access to wiki knowledge.
For developers building content management systems, documentation APIs, or knowledge graph applications, MediaWiki-to-JSON conversion provides the data foundation. The JSON output can be loaded directly into MongoDB or other document databases, served through REST APIs, consumed by React or Vue frontend applications, and processed by data pipelines. This bridges the gap between collaborative wiki authoring and modern software architecture.
Key Benefits of Converting MediaWiki to JSON:
- API Ready: JSON is the native format for REST and GraphQL APIs
- Structured Data: Wiki content becomes queryable objects and arrays
- Database Storage: Direct import into MongoDB, CouchDB, PostgreSQL JSONB
- Universal Parsing: Every programming language has JSON support
- Schema Validation: Validate output structure with JSON Schema
- Web Integration: Native consumption by JavaScript applications
- Data Pipeline: Feed wiki content into analytics and ML workflows
Practical Examples
Example 1: Wiki Article to JSON Object
Input MediaWiki file (planet.mediawiki):
== Mars == '''Mars''' is the fourth planet from the [[Sun]]. === Physical Properties === * '''Diameter''': 6,779 km * '''Mass''': 6.39 x 10^23 kg * '''Moons''': 2 ([[Phobos]], [[Deimos]]) [[Category:Planets]] [[Category:Terrestrial planets]]
Output JSON file (planet.json):
{
"title": "Mars",
"description": "Mars is the fourth planet
from the Sun.",
"sections": [{
"heading": "Physical Properties",
"properties": {
"diameter": "6,779 km",
"mass": "6.39 x 10^23 kg",
"moons": ["Phobos", "Deimos"]
}
}],
"categories": ["Planets",
"Terrestrial planets"]
}
Example 2: Wiki Table to JSON Array
Input MediaWiki file (employees.mediawiki):
== Team Members ==
{| class="wikitable"
|-
! Name !! Role !! Department
|-
| Alice Johnson || Lead Developer || Engineering
|-
| Bob Smith || Designer || UX Design
|-
| Carol White || PM || Operations
|}
Output JSON file (employees.json):
{
"title": "Team Members",
"data": [
{
"name": "Alice Johnson",
"role": "Lead Developer",
"department": "Engineering"
},
{
"name": "Bob Smith",
"role": "Designer",
"department": "UX Design"
},
{
"name": "Carol White",
"role": "PM",
"department": "Operations"
}
]
}
Example 3: Infobox Template to JSON
Input MediaWiki file (city.mediawiki):
{{Infobox city
| name = Tokyo
| country = Japan
| population = 13960000
| area_km2 = 2194
| timezone = JST (UTC+9)
| mayor = Yuriko Koike
}}
== Overview ==
'''Tokyo''' is the capital of [[Japan]].
Output JSON file (city.json):
{
"infobox": {
"type": "city",
"name": "Tokyo",
"country": "Japan",
"population": 13960000,
"area_km2": 2194,
"timezone": "JST (UTC+9)",
"mayor": "Yuriko Koike"
},
"sections": [{
"heading": "Overview",
"content": "Tokyo is the capital
of Japan."
}]
}
Frequently Asked Questions (FAQ)
Q: How is the wiki structure mapped to JSON?
A: The conversion creates a hierarchical JSON object that mirrors the wiki document structure. The page title becomes the root "title" field, headings become keys in a "sections" array, paragraphs become "content" string values, lists become JSON arrays, tables become arrays of objects (using header cells as keys), and template parameters become key-value pairs in nested objects.
Q: Are wiki infobox templates converted to structured JSON?
A: Yes! Infobox templates are one of the best-suited wiki elements for JSON conversion. Each named parameter in the infobox becomes a key-value pair in a JSON object. Numeric values are converted to JSON numbers, boolean-like values become true/false, and list values become arrays. The template name is stored as a "type" field, enabling type-based processing in your application.
Q: Can I use the JSON output with a REST API?
A: Absolutely. The JSON output is ready for direct use with REST APIs. You can serve it through Express.js, Flask, Django REST Framework, or any API framework. The structured format with typed values (strings, numbers, arrays, objects) maps naturally to API response schemas. You can also validate the output against a JSON Schema to ensure consistent API responses.
Q: How does the conversion handle wiki links in JSON?
A: Internal wiki links ([[Page Name|Display Text]]) are converted to JSON objects with "target" and "text" fields, or simplified to plain text strings depending on the conversion mode. External links are preserved with their URLs. This allows applications to reconstruct hyperlinks or process link relationships programmatically. Link targets can be used to build knowledge graphs or navigation structures.
Q: Can the JSON output be imported into MongoDB?
A: Yes, the JSON output is directly compatible with MongoDB's document format. You can use mongoimport to load the JSON files, or insert them programmatically using any MongoDB driver. Each wiki page becomes a document in your collection, with the nested structure preserved as embedded documents and arrays. MongoDB's flexible schema handles varying wiki page structures seamlessly.
Q: What happens to wiki categories in JSON?
A: Wiki categories ([[Category:Name]]) are extracted and stored in a "categories" array at the root level of the JSON document. This makes it easy to filter, sort, and group wiki content by category in your application. Sub-categories can be represented as nested arrays or with parent-child relationships, depending on the depth of category information available.
Q: Is the JSON output minified or pretty-printed?
A: The default output is pretty-printed with indentation for human readability. This makes it easy to inspect and debug the converted content. For production use where file size matters, you can minify the JSON by removing whitespace. Most JSON libraries have both formatting options: Python's json.dumps with indent=2 for pretty-printing or separators=(',',':') for minification.
Q: How are wiki references and citations handled?
A: References (<ref> tags) are extracted and stored in a "references" array. Each reference becomes a JSON object with fields for the reference text, URL (if present), author, title, and other citation metadata. This structured approach makes it easy to build bibliography databases, citation indices, and reference management features from wiki content.