Convert MediaWiki to CSV

Drag and drop files here or click to select.
Max file size 100mb.
Uploading progress:

MediaWiki vs CSV Format Comparison

Aspect MediaWiki (Source Format) CSV (Target Format)
Format Overview
MediaWiki
MediaWiki Markup Language

Wiki markup language developed by Magnus Manske and Lee Daniel Crocker for Wikipedia in 2002. Features == headings ==, '''bold''', ''italic'', [[links]], templates, and complex table markup using {| and |}. Used by Wikipedia, Fandom, and thousands of wikis for structured content.

Wiki Standard Structured Content
CSV
Comma-Separated Values

Simple tabular data format where values are separated by commas and records by newlines. Defined in RFC 4180, CSV is the most universal format for exchanging structured data between spreadsheets, databases, and applications. Human-readable and supported by virtually every data tool.

Tabular Data Universal
Technical Specifications
Type: Wiki markup language
Encoding: UTF-8
MIME Type: text/x-wiki
Extensions: .mediawiki, .wiki, .txt
Structure: Hierarchical document
Data Model: Document-oriented
Type: Tabular data format
Encoding: UTF-8, ASCII, various
MIME Type: text/csv
Extensions: .csv
Structure: Rows and columns (2D table)
Standard: RFC 4180
Syntax Examples

MediaWiki table markup:

{| class="wikitable"
|-
! Name !! Population !! Country
|-
| Paris || 2,161,000 || France
|-
| London || 8,982,000 || UK
|-
| Tokyo || 13,960,000 || Japan
|}

CSV output:

Name,Population,Country
Paris,"2,161,000",France
London,"8,982,000",UK
Tokyo,"13,960,000",Japan
Content Support
  • Hierarchical document structure
  • Rich text formatting
  • Complex table layouts
  • Links and references
  • Templates and transclusion
  • Categories and metadata
  • Images and media
  • Nested content structures
  • Flat tabular data only
  • Plain text values
  • Comma-delimited columns
  • Newline-delimited rows
  • Quoted fields for special characters
  • Optional header row
  • No formatting or styling
  • No nested structures
Advantages
  • Rich structured content
  • Visual table rendering
  • Templates for data presentation
  • Links to related content
  • Powers Wikipedia tables
  • Collaborative editing
  • Universal compatibility
  • Opens in any spreadsheet
  • Importable to any database
  • Minimal file size
  • Simple and human-readable
  • Machine-parseable without special libraries
  • Standard data exchange format
Disadvantages
  • Complex table syntax
  • Data not easily extractable
  • Requires parser for data access
  • Mixed content and presentation
  • Not suitable for data analysis
  • No formatting or styling
  • No hierarchical data support
  • Commas in data need quoting
  • No data type information
  • No multi-sheet support
  • Encoding ambiguity
Common Uses
  • Wikipedia data tables
  • Knowledge base content
  • Reference and comparison tables
  • Wiki-based databases
  • Structured wiki content
  • Spreadsheet data exchange
  • Database import/export
  • Data analysis and reporting
  • Application data feeds
  • Bulk data processing
Best For
  • Visual data presentation
  • Structured wiki articles
  • Collaborative data editing
  • Contextual data with narrative
  • Data portability and exchange
  • Spreadsheet workflows
  • Database operations
  • Automated data processing
Version History
Introduced: 2002 (for Wikipedia)
Creators: Magnus Manske, Lee Daniel Crocker
Status: Actively maintained
Evolution: Parsoid, VisualEditor, Lua
Introduced: Early 1970s (IBM mainframes)
Standard: RFC 4180 (2005)
Status: Stable, universal standard
Evolution: Minimal changes over decades
Software Support
MediaWiki: Native rendering
Pandoc: Read/write support
Editors: VisualEditor, WikiEditor
Other: Parsoid, wiki tools
Microsoft Excel: Native import/export
Google Sheets: Full support
Databases: All major RDBMS
Other: LibreOffice, Python pandas, R

Why Convert MediaWiki to CSV?

Converting MediaWiki markup to CSV format allows you to extract structured tabular data from wiki pages into a universally compatible spreadsheet format. Wikipedia and other wikis contain vast amounts of data stored in wiki tables that are rich in information but locked in wiki markup syntax. CSV conversion unlocks this data for use in spreadsheets like Excel and Google Sheets, databases, data analysis tools, and programming workflows.

MediaWiki tables use a complex markup syntax with {| for table start, |- for row separators, ! for header cells, || for cell delimiters, and |} for table end. While this syntax renders beautifully in a wiki browser, it is not directly usable by data tools. Converting to CSV transforms these wiki tables into clean comma-separated rows that any spreadsheet application, database system, or programming language can import instantly.

CSV (Comma-Separated Values) is the most universal data exchange format in existence. Defined in RFC 4180, it represents tabular data as plain text with commas separating values and newlines separating rows. Every spreadsheet application, every database, every programming language, and every data analysis tool supports CSV. Converting wiki data to CSV opens up possibilities for sorting, filtering, charting, statistical analysis, and data-driven reporting.

This conversion is especially valuable for researchers, data analysts, and developers who need to work with data published on Wikipedia or other wikis. Geographic data, population statistics, historical timelines, comparison tables, and scientific datasets stored in wiki tables can be extracted to CSV for further analysis. The conversion strips formatting markup and preserves the raw data values.

Key Benefits of Converting MediaWiki to CSV:

  • Data Extraction: Extract tabular data from wiki pages for analysis and processing
  • Spreadsheet Ready: Open directly in Excel, Google Sheets, or LibreOffice Calc
  • Database Import: Load wiki data into MySQL, PostgreSQL, SQLite, or any RDBMS
  • Universal Format: Compatible with every data tool and programming language
  • Clean Data: Strip all wiki markup, leaving only raw data values
  • Data Analysis: Use with pandas, R, SPSS, or any analytical tool
  • Automation Friendly: Easy to process programmatically in batch workflows

Practical Examples

Example 1: Wikipedia Data Table to Spreadsheet

Input MediaWiki file (countries.mediawiki):

== Countries by GDP ==

{| class="wikitable sortable"
|-
! Country !! GDP (Billion $) !! Population !! Region
|-
| [[United States]] || 25,462 || 331,900,000 || North America
|-
| [[China]] || 17,963 || 1,412,600,000 || Asia
|-
| [[Japan]] || 4,231 || 125,700,000 || Asia
|-
| [[Germany]] || 4,072 || 83,200,000 || Europe
|}

Output CSV file (countries.csv):

Country,GDP (Billion $),Population,Region
United States,"25,462","331,900,000",North America
China,"17,963","1,412,600,000",Asia
Japan,"4,231","125,700,000",Asia
Germany,"4,072","83,200,000",Europe

Example 2: Wiki Comparison Table to Analysis Data

Input MediaWiki file (languages.mediawiki):

{| class="wikitable"
|-
! Language !! Paradigm !! Year !! Creator
|-
| [[Python (programming language)|Python]] || Multi-paradigm || 1991 || Guido van Rossum
|-
| [[JavaScript]] || Multi-paradigm || 1995 || Brendan Eich
|-
| [[Rust (programming language)|Rust]] || Multi-paradigm || 2015 || Graydon Hoare
|}

Output CSV file (languages.csv):

Language,Paradigm,Year,Creator
Python,Multi-paradigm,1991,Guido van Rossum
JavaScript,Multi-paradigm,1995,Brendan Eich
Rust,Multi-paradigm,2015,Graydon Hoare

Example 3: Wiki Content to Database Records

Input MediaWiki file (inventory.mediawiki):

== Product Inventory ==

{| class="wikitable"
|-
! SKU !! Product Name !! Price !! Stock !! Category
|-
| A001 || '''Widget Pro''' || $29.99 || 150 || Electronics
|-
| B002 || ''Gadget Mini'' || $14.50 || 320 || Accessories
|-
| C003 || Basic Tool Kit || $49.99 || 75 || Tools
|}

{{Updated|March 2026}}

Output CSV file (inventory.csv):

SKU,Product Name,Price,Stock,Category
A001,Widget Pro,$29.99,150,Electronics
B002,Gadget Mini,$14.50,320,Accessories
C003,Basic Tool Kit,$49.99,75,Tools

Frequently Asked Questions (FAQ)

Q: What data from MediaWiki is included in the CSV?

A: The converter extracts tabular data from MediaWiki tables (marked with {| and |} syntax). Table headers become the CSV header row, and each table row becomes a CSV data row. Wiki formatting markup (bold, italic, links) is stripped, leaving only the plain text content of each cell. Non-table content like headings and paragraphs is not included in the CSV output.

Q: What happens to wiki links in table cells?

A: Wiki links like [[United States]] are converted to their display text ("United States"). Piped links like [[Python (programming language)|Python]] use the display text ("Python"). External links are converted to their anchor text. This ensures the CSV contains clean, readable data values without any wiki markup syntax.

Q: How are commas in data handled?

A: Following the CSV standard (RFC 4180), values containing commas are enclosed in double quotes. For example, a population value of "1,412,600,000" is properly quoted in the CSV output. Values containing double quotes are escaped by doubling them. This ensures proper parsing by all CSV-compatible software.

Q: Can I extract multiple tables from one MediaWiki page?

A: Yes, if the MediaWiki page contains multiple tables, the converter extracts data from all tables. Each table is either merged into a single CSV (if columns match) or separated into distinct sections. For pages with tables of different structures, you may want to process them individually for the cleanest results.

Q: What about merged cells or colspan in wiki tables?

A: CSV format does not support merged cells. When a MediaWiki table contains colspan or rowspan attributes, the converter expands the merged cells by repeating the value across the appropriate columns or rows. This maintains the data grid structure that CSV requires while preserving all information from the original table.

Q: Can I open the CSV in Excel or Google Sheets?

A: Absolutely! CSV is natively supported by Microsoft Excel, Google Sheets, LibreOffice Calc, Apple Numbers, and virtually every spreadsheet application. Simply open the .csv file or import it. For proper Unicode support (especially with non-Latin characters), some applications may require selecting UTF-8 encoding during import.

Q: How is non-table wiki content handled?

A: The CSV conversion focuses on extracting tabular data. Non-table content such as headings, paragraphs, lists, templates, and references is not included in the CSV output, as CSV is a flat data format that cannot represent hierarchical document structures. If you need the full document content, consider converting to a format like TXT, HTML, or JSON instead.

Q: Can I use the CSV for data analysis with Python or R?

A: Yes, CSV is the most common format for data analysis. In Python, use pandas.read_csv() to load the data into a DataFrame. In R, use read.csv() to import it. Both tools handle the CSV format natively and provide powerful data manipulation, statistical analysis, and visualization capabilities for the extracted wiki data.