Convert MediaWiki to CSV

Drag and drop files here or click to select.
Max file size 100mb.

Uploading progress:

MediaWiki vs CSV Format Comparison

Aspect	MediaWiki (Source Format)	CSV (Target Format)
Format Overview	MediaWiki MediaWiki Markup Language Wiki markup language developed by Magnus Manske and Lee Daniel Crocker for Wikipedia in 2002. Features == headings ==, '''bold''', ''italic'', [[links]], templates, and complex table markup using {\| and \|}. Used by Wikipedia, Fandom, and thousands of wikis for structured content. Wiki Standard Structured Content	CSV Comma-Separated Values Simple tabular data format where values are separated by commas and records by newlines. Defined in RFC 4180, CSV is the most universal format for exchanging structured data between spreadsheets, databases, and applications. Human-readable and supported by virtually every data tool. Tabular Data Universal
Technical Specifications	Type: Wiki markup language Encoding: UTF-8 MIME Type: text/x-wiki Extensions: .mediawiki, .wiki, .txt Structure: Hierarchical document Data Model: Document-oriented	Type: Tabular data format Encoding: UTF-8, ASCII, various MIME Type: text/csv Extensions: .csv Structure: Rows and columns (2D table) Standard: RFC 4180
Syntax Examples	MediaWiki table markup: {\| class="wikitable" \|- ! Name !! Population !! Country \|- \| Paris \|\| 2,161,000 \|\| France \|- \| London \|\| 8,982,000 \|\| UK \|- \| Tokyo \|\| 13,960,000 \|\| Japan \|}	CSV output: Name,Population,Country Paris,"2,161,000",France London,"8,982,000",UK Tokyo,"13,960,000",Japan
Content Support	Hierarchical document structure Rich text formatting Complex table layouts Links and references Templates and transclusion Categories and metadata Images and media Nested content structures	Flat tabular data only Plain text values Comma-delimited columns Newline-delimited rows Quoted fields for special characters Optional header row No formatting or styling No nested structures
Advantages	Rich structured content Visual table rendering Templates for data presentation Links to related content Powers Wikipedia tables Collaborative editing	Universal compatibility Opens in any spreadsheet Importable to any database Minimal file size Simple and human-readable Machine-parseable without special libraries Standard data exchange format
Disadvantages	Complex table syntax Data not easily extractable Requires parser for data access Mixed content and presentation Not suitable for data analysis	No formatting or styling No hierarchical data support Commas in data need quoting No data type information No multi-sheet support Encoding ambiguity
Common Uses	Wikipedia data tables Knowledge base content Reference and comparison tables Wiki-based databases Structured wiki content	Spreadsheet data exchange Database import/export Data analysis and reporting Application data feeds Bulk data processing
Best For	Visual data presentation Structured wiki articles Collaborative data editing Contextual data with narrative	Data portability and exchange Spreadsheet workflows Database operations Automated data processing
Version History	Introduced: 2002 (for Wikipedia) Creators: Magnus Manske, Lee Daniel Crocker Status: Actively maintained Evolution: Parsoid, VisualEditor, Lua	Introduced: Early 1970s (IBM mainframes) Standard: RFC 4180 (2005) Status: Stable, universal standard Evolution: Minimal changes over decades
Software Support	MediaWiki: Native rendering Pandoc: Read/write support Editors: VisualEditor, WikiEditor Other: Parsoid, wiki tools	Microsoft Excel: Native import/export Google Sheets: Full support Databases: All major RDBMS Other: LibreOffice, Python pandas, R

Why Convert MediaWiki to CSV?

Converting MediaWiki markup to CSV format allows you to extract structured tabular data from wiki pages into a universally compatible spreadsheet format. Wikipedia and other wikis contain vast amounts of data stored in wiki tables that are rich in information but locked in wiki markup syntax. CSV conversion unlocks this data for use in spreadsheets like Excel and Google Sheets, databases, data analysis tools, and programming workflows.

MediaWiki tables use a complex markup syntax with {| for table start, |- for row separators, ! for header cells, || for cell delimiters, and |} for table end. While this syntax renders beautifully in a wiki browser, it is not directly usable by data tools. Converting to CSV transforms these wiki tables into clean comma-separated rows that any spreadsheet application, database system, or programming language can import instantly.

CSV (Comma-Separated Values) is the most universal data exchange format in existence. Defined in RFC 4180, it represents tabular data as plain text with commas separating values and newlines separating rows. Every spreadsheet application, every database, every programming language, and every data analysis tool supports CSV. Converting wiki data to CSV opens up possibilities for sorting, filtering, charting, statistical analysis, and data-driven reporting.

This conversion is especially valuable for researchers, data analysts, and developers who need to work with data published on Wikipedia or other wikis. Geographic data, population statistics, historical timelines, comparison tables, and scientific datasets stored in wiki tables can be extracted to CSV for further analysis. The conversion strips formatting markup and preserves the raw data values.

Key Benefits of Converting MediaWiki to CSV:

Data Extraction: Extract tabular data from wiki pages for analysis and processing
Spreadsheet Ready: Open directly in Excel, Google Sheets, or LibreOffice Calc
Database Import: Load wiki data into MySQL, PostgreSQL, SQLite, or any RDBMS
Universal Format: Compatible with every data tool and programming language
Clean Data: Strip all wiki markup, leaving only raw data values
Data Analysis: Use with pandas, R, SPSS, or any analytical tool
Automation Friendly: Easy to process programmatically in batch workflows

Practical Examples

Example 1: Wikipedia Data Table to Spreadsheet

Input MediaWiki file (countries.mediawiki):

== Countries by GDP ==

{| class="wikitable sortable"
|-
! Country !! GDP (Billion $) !! Population !! Region
|-
| [[United States]] || 25,462 || 331,900,000 || North America
|-
| [[China]] || 17,963 || 1,412,600,000 || Asia
|-
| [[Japan]] || 4,231 || 125,700,000 || Asia
|-
| [[Germany]] || 4,072 || 83,200,000 || Europe
|}

Output CSV file (countries.csv):

Country,GDP (Billion $),Population,Region
United States,"25,462","331,900,000",North America
China,"17,963","1,412,600,000",Asia
Japan,"4,231","125,700,000",Asia
Germany,"4,072","83,200,000",Europe

Example 2: Wiki Comparison Table to Analysis Data

Input MediaWiki file (languages.mediawiki):

{| class="wikitable"
|-
! Language !! Paradigm !! Year !! Creator
|-
| [[Python (programming language)|Python]] || Multi-paradigm || 1991 || Guido van Rossum
|-
| [[JavaScript]] || Multi-paradigm || 1995 || Brendan Eich
|-
| [[Rust (programming language)|Rust]] || Multi-paradigm || 2015 || Graydon Hoare
|}

Output CSV file (languages.csv):

Language,Paradigm,Year,Creator
Python,Multi-paradigm,1991,Guido van Rossum
JavaScript,Multi-paradigm,1995,Brendan Eich
Rust,Multi-paradigm,2015,Graydon Hoare

Example 3: Wiki Content to Database Records

Input MediaWiki file (inventory.mediawiki):

== Product Inventory ==

{| class="wikitable"
|-
! SKU !! Product Name !! Price !! Stock !! Category
|-
| A001 || '''Widget Pro''' || $29.99 || 150 || Electronics
|-
| B002 || ''Gadget Mini'' || $14.50 || 320 || Accessories
|-
| C003 || Basic Tool Kit || $49.99 || 75 || Tools
|}

{{Updated|March 2026}}

Output CSV file (inventory.csv):

SKU,Product Name,Price,Stock,Category
A001,Widget Pro,$29.99,150,Electronics
B002,Gadget Mini,$14.50,320,Accessories
C003,Basic Tool Kit,$49.99,75,Tools

Frequently Asked Questions (FAQ)

Q: What data from MediaWiki is included in the CSV?

A: The converter extracts tabular data from MediaWiki tables (marked with {| and |} syntax). Table headers become the CSV header row, and each table row becomes a CSV data row. Wiki formatting markup (bold, italic, links) is stripped, leaving only the plain text content of each cell. Non-table content like headings and paragraphs is not included in the CSV output.

Q: What happens to wiki links in table cells?

A: Wiki links like [[United States]] are converted to their display text ("United States"). Piped links like [[Python (programming language)|Python]] use the display text ("Python"). External links are converted to their anchor text. This ensures the CSV contains clean, readable data values without any wiki markup syntax.

Q: How are commas in data handled?

A: Following the CSV standard (RFC 4180), values containing commas are enclosed in double quotes. For example, a population value of "1,412,600,000" is properly quoted in the CSV output. Values containing double quotes are escaped by doubling them. This ensures proper parsing by all CSV-compatible software.

Q: Can I extract multiple tables from one MediaWiki page?

A: Yes, if the MediaWiki page contains multiple tables, the converter extracts data from all tables. Each table is either merged into a single CSV (if columns match) or separated into distinct sections. For pages with tables of different structures, you may want to process them individually for the cleanest results.

Q: What about merged cells or colspan in wiki tables?

A: CSV format does not support merged cells. When a MediaWiki table contains colspan or rowspan attributes, the converter expands the merged cells by repeating the value across the appropriate columns or rows. This maintains the data grid structure that CSV requires while preserving all information from the original table.

Q: Can I open the CSV in Excel or Google Sheets?

A: Absolutely! CSV is natively supported by Microsoft Excel, Google Sheets, LibreOffice Calc, Apple Numbers, and virtually every spreadsheet application. Simply open the .csv file or import it. For proper Unicode support (especially with non-Latin characters), some applications may require selecting UTF-8 encoding during import.

Q: How is non-table wiki content handled?

A: The CSV conversion focuses on extracting tabular data. Non-table content such as headings, paragraphs, lists, templates, and references is not included in the CSV output, as CSV is a flat data format that cannot represent hierarchical document structures. If you need the full document content, consider converting to a format like TXT, HTML, or JSON instead.

Q: Can I use the CSV for data analysis with Python or R?

A: Yes, CSV is the most common format for data analysis. In Python, use pandas.read_csv() to load the data into a DataFrame. In R, use read.csv() to import it. Both tools handle the CSV format natively and provide powerful data manipulation, statistical analysis, and visualization capabilities for the extracted wiki data.