Convert MediaWiki to CSV
Max file size 100mb.
MediaWiki vs CSV Format Comparison
| Aspect | MediaWiki (Source Format) | CSV (Target Format) |
|---|---|---|
| Format Overview |
MediaWiki
MediaWiki Markup Language
Wiki markup language developed by Magnus Manske and Lee Daniel Crocker for Wikipedia in 2002. Features == headings ==, '''bold''', ''italic'', [[links]], templates, and complex table markup using {| and |}. Used by Wikipedia, Fandom, and thousands of wikis for structured content. Wiki Standard Structured Content |
CSV
Comma-Separated Values
Simple tabular data format where values are separated by commas and records by newlines. Defined in RFC 4180, CSV is the most universal format for exchanging structured data between spreadsheets, databases, and applications. Human-readable and supported by virtually every data tool. Tabular Data Universal |
| Technical Specifications |
Type: Wiki markup language
Encoding: UTF-8 MIME Type: text/x-wiki Extensions: .mediawiki, .wiki, .txt Structure: Hierarchical document Data Model: Document-oriented |
Type: Tabular data format
Encoding: UTF-8, ASCII, various MIME Type: text/csv Extensions: .csv Structure: Rows and columns (2D table) Standard: RFC 4180 |
| Syntax Examples |
MediaWiki table markup: {| class="wikitable"
|-
! Name !! Population !! Country
|-
| Paris || 2,161,000 || France
|-
| London || 8,982,000 || UK
|-
| Tokyo || 13,960,000 || Japan
|}
|
CSV output: Name,Population,Country Paris,"2,161,000",France London,"8,982,000",UK Tokyo,"13,960,000",Japan |
| Content Support |
|
|
| Advantages |
|
|
| Disadvantages |
|
|
| Common Uses |
|
|
| Best For |
|
|
| Version History |
Introduced: 2002 (for Wikipedia)
Creators: Magnus Manske, Lee Daniel Crocker Status: Actively maintained Evolution: Parsoid, VisualEditor, Lua |
Introduced: Early 1970s (IBM mainframes)
Standard: RFC 4180 (2005) Status: Stable, universal standard Evolution: Minimal changes over decades |
| Software Support |
MediaWiki: Native rendering
Pandoc: Read/write support Editors: VisualEditor, WikiEditor Other: Parsoid, wiki tools |
Microsoft Excel: Native import/export
Google Sheets: Full support Databases: All major RDBMS Other: LibreOffice, Python pandas, R |
Why Convert MediaWiki to CSV?
Converting MediaWiki markup to CSV format allows you to extract structured tabular data from wiki pages into a universally compatible spreadsheet format. Wikipedia and other wikis contain vast amounts of data stored in wiki tables that are rich in information but locked in wiki markup syntax. CSV conversion unlocks this data for use in spreadsheets like Excel and Google Sheets, databases, data analysis tools, and programming workflows.
MediaWiki tables use a complex markup syntax with {| for table start, |- for row separators, ! for header cells, || for cell delimiters, and |} for table end. While this syntax renders beautifully in a wiki browser, it is not directly usable by data tools. Converting to CSV transforms these wiki tables into clean comma-separated rows that any spreadsheet application, database system, or programming language can import instantly.
CSV (Comma-Separated Values) is the most universal data exchange format in existence. Defined in RFC 4180, it represents tabular data as plain text with commas separating values and newlines separating rows. Every spreadsheet application, every database, every programming language, and every data analysis tool supports CSV. Converting wiki data to CSV opens up possibilities for sorting, filtering, charting, statistical analysis, and data-driven reporting.
This conversion is especially valuable for researchers, data analysts, and developers who need to work with data published on Wikipedia or other wikis. Geographic data, population statistics, historical timelines, comparison tables, and scientific datasets stored in wiki tables can be extracted to CSV for further analysis. The conversion strips formatting markup and preserves the raw data values.
Key Benefits of Converting MediaWiki to CSV:
- Data Extraction: Extract tabular data from wiki pages for analysis and processing
- Spreadsheet Ready: Open directly in Excel, Google Sheets, or LibreOffice Calc
- Database Import: Load wiki data into MySQL, PostgreSQL, SQLite, or any RDBMS
- Universal Format: Compatible with every data tool and programming language
- Clean Data: Strip all wiki markup, leaving only raw data values
- Data Analysis: Use with pandas, R, SPSS, or any analytical tool
- Automation Friendly: Easy to process programmatically in batch workflows
Practical Examples
Example 1: Wikipedia Data Table to Spreadsheet
Input MediaWiki file (countries.mediawiki):
== Countries by GDP ==
{| class="wikitable sortable"
|-
! Country !! GDP (Billion $) !! Population !! Region
|-
| [[United States]] || 25,462 || 331,900,000 || North America
|-
| [[China]] || 17,963 || 1,412,600,000 || Asia
|-
| [[Japan]] || 4,231 || 125,700,000 || Asia
|-
| [[Germany]] || 4,072 || 83,200,000 || Europe
|}
Output CSV file (countries.csv):
Country,GDP (Billion $),Population,Region United States,"25,462","331,900,000",North America China,"17,963","1,412,600,000",Asia Japan,"4,231","125,700,000",Asia Germany,"4,072","83,200,000",Europe
Example 2: Wiki Comparison Table to Analysis Data
Input MediaWiki file (languages.mediawiki):
{| class="wikitable"
|-
! Language !! Paradigm !! Year !! Creator
|-
| [[Python (programming language)|Python]] || Multi-paradigm || 1991 || Guido van Rossum
|-
| [[JavaScript]] || Multi-paradigm || 1995 || Brendan Eich
|-
| [[Rust (programming language)|Rust]] || Multi-paradigm || 2015 || Graydon Hoare
|}
Output CSV file (languages.csv):
Language,Paradigm,Year,Creator Python,Multi-paradigm,1991,Guido van Rossum JavaScript,Multi-paradigm,1995,Brendan Eich Rust,Multi-paradigm,2015,Graydon Hoare
Example 3: Wiki Content to Database Records
Input MediaWiki file (inventory.mediawiki):
== Product Inventory ==
{| class="wikitable"
|-
! SKU !! Product Name !! Price !! Stock !! Category
|-
| A001 || '''Widget Pro''' || $29.99 || 150 || Electronics
|-
| B002 || ''Gadget Mini'' || $14.50 || 320 || Accessories
|-
| C003 || Basic Tool Kit || $49.99 || 75 || Tools
|}
{{Updated|March 2026}}
Output CSV file (inventory.csv):
SKU,Product Name,Price,Stock,Category A001,Widget Pro,$29.99,150,Electronics B002,Gadget Mini,$14.50,320,Accessories C003,Basic Tool Kit,$49.99,75,Tools
Frequently Asked Questions (FAQ)
Q: What data from MediaWiki is included in the CSV?
A: The converter extracts tabular data from MediaWiki tables (marked with {| and |} syntax). Table headers become the CSV header row, and each table row becomes a CSV data row. Wiki formatting markup (bold, italic, links) is stripped, leaving only the plain text content of each cell. Non-table content like headings and paragraphs is not included in the CSV output.
Q: What happens to wiki links in table cells?
A: Wiki links like [[United States]] are converted to their display text ("United States"). Piped links like [[Python (programming language)|Python]] use the display text ("Python"). External links are converted to their anchor text. This ensures the CSV contains clean, readable data values without any wiki markup syntax.
Q: How are commas in data handled?
A: Following the CSV standard (RFC 4180), values containing commas are enclosed in double quotes. For example, a population value of "1,412,600,000" is properly quoted in the CSV output. Values containing double quotes are escaped by doubling them. This ensures proper parsing by all CSV-compatible software.
Q: Can I extract multiple tables from one MediaWiki page?
A: Yes, if the MediaWiki page contains multiple tables, the converter extracts data from all tables. Each table is either merged into a single CSV (if columns match) or separated into distinct sections. For pages with tables of different structures, you may want to process them individually for the cleanest results.
Q: What about merged cells or colspan in wiki tables?
A: CSV format does not support merged cells. When a MediaWiki table contains colspan or rowspan attributes, the converter expands the merged cells by repeating the value across the appropriate columns or rows. This maintains the data grid structure that CSV requires while preserving all information from the original table.
Q: Can I open the CSV in Excel or Google Sheets?
A: Absolutely! CSV is natively supported by Microsoft Excel, Google Sheets, LibreOffice Calc, Apple Numbers, and virtually every spreadsheet application. Simply open the .csv file or import it. For proper Unicode support (especially with non-Latin characters), some applications may require selecting UTF-8 encoding during import.
Q: How is non-table wiki content handled?
A: The CSV conversion focuses on extracting tabular data. Non-table content such as headings, paragraphs, lists, templates, and references is not included in the CSV output, as CSV is a flat data format that cannot represent hierarchical document structures. If you need the full document content, consider converting to a format like TXT, HTML, or JSON instead.
Q: Can I use the CSV for data analysis with Python or R?
A: Yes, CSV is the most common format for data analysis. In Python, use pandas.read_csv() to load the data into a DataFrame. In R, use read.csv() to import it. Both tools handle the CSV format natively and provide powerful data manipulation, statistical analysis, and visualization capabilities for the extracted wiki data.