Convert MediaWiki to TSV

Drag and drop files here or click to select.
Max file size 100mb.
Uploading progress:

MediaWiki vs TSV Format Comparison

Aspect MediaWiki (Source Format) TSV (Target Format)
Format Overview
MediaWiki
MediaWiki Markup Language

Lightweight markup language created for Wikipedia in 2002 and used by all MediaWiki-powered wikis. Uses distinctive syntax with == headings ==, '''bold''', ''italic'', [[links]], and {| tables |} for collaborative web content creation and editing.

Wiki Markup Plain Text
TSV
Tab-Separated Values

Simple tabular data format where columns are separated by tab characters and rows by newlines. Widely used for data exchange between spreadsheets, databases, and statistical software. Simpler than CSV as tab characters rarely appear in data, reducing the need for quoting and escaping.

Tabular Data Plain Text
Technical Specifications
Structure: Plain text with wiki markup
Encoding: UTF-8
Format: Text-based markup language
Compression: None (plain text)
Extensions: .mediawiki, .wiki, .txt
Structure: Tab-delimited rows and columns
Encoding: UTF-8 or ASCII
Format: Flat tabular data
Compression: None (plain text)
Extensions: .tsv, .tab, .txt
Syntax Examples

MediaWiki uses wiki-style tables:

{| class="wikitable"
|-
! Name !! Age !! City
|-
| Alice || 30 || New York
|-
| Bob || 25 || London
|}

TSV uses tabs between columns:

Name	Age	City
Alice	30	New York
Bob	25	London
Content Support
  • Section headings (levels 1-6)
  • Bold, italic, underline formatting
  • Bulleted and numbered lists
  • Wiki-style tables
  • Internal and external links
  • Image embedding via file references
  • Categories and templates
  • Table of contents (auto-generated)
  • References and citations
  • Infoboxes and navboxes
  • Plain text values in columns
  • Header row (optional)
  • Numeric and text data
  • Unicode text content
  • Multiple rows of data
  • No formatting or styling
  • No nested structures
  • No formulas or calculations
Advantages
  • Powers Wikipedia and thousands of wikis
  • Built-in linking and categorization
  • Collaborative editing support
  • Auto-generated table of contents
  • Template and transclusion system
  • Version history tracking
  • Universally compatible with spreadsheets
  • No quoting issues (tabs in data are rare)
  • Simpler than CSV for most data
  • Easy to parse programmatically
  • Imports directly into databases
  • Minimal file size overhead
Disadvantages
  • Complex table syntax
  • Requires MediaWiki software to render
  • Not widely used outside wikis
  • Template syntax can be confusing
  • No native print layout support
  • No text formatting support
  • Flat structure only (no nesting)
  • Tab characters in data cause issues
  • No standard specification (IANA type)
  • No metadata or schema definition
Common Uses
  • Wikipedia articles and pages
  • Corporate wikis and knowledge bases
  • Technical documentation wikis
  • Community-driven encyclopedias
  • Open-source project documentation
  • Data exchange between applications
  • Spreadsheet imports and exports
  • Database bulk loading
  • Scientific data files
  • Bioinformatics data formats
  • Log file analysis
Best For
  • Wiki-based content publishing
  • Collaborative documentation
  • Knowledge base articles
  • Wikipedia contributions
  • Data that contains commas
  • Quick spreadsheet data exchange
  • Database import/export operations
  • Scientific and research data
Version History
Introduced: 2002 (MediaWiki 1.0)
Current Version: MediaWiki 1.42 (2024)
Status: Actively maintained and developed
Evolution: Regular updates with new features
Introduced: Early computing era
Standard: IANA media type: text/tab-separated-values
Status: Stable, universally supported
Evolution: De facto standard, no formal versioning
Software Support
MediaWiki: Native rendering engine
Wikipedia: Primary content format
Pandoc: Full conversion support
Other: Any text editor for source editing
Microsoft Excel: Full import/export support
Google Sheets: Native open support
LibreOffice Calc: Full support
Other: Python, R, databases, all data tools

Why Convert MediaWiki to TSV?

Converting MediaWiki markup to TSV format is essential when you need to extract tabular data from wiki pages for analysis, import into spreadsheets, or load into databases. Wikipedia and MediaWiki-based wikis contain vast amounts of structured data in wiki tables, and converting these to TSV makes the data accessible to standard data processing tools like Excel, Google Sheets, Python pandas, and SQL databases.

MediaWiki tables use a complex markup syntax with {| for table start, |- for row separators, ! for header cells, and || for cell separators. While this syntax renders beautifully in a web browser, it is not suitable for data analysis or programmatic processing. TSV format strips away all wiki markup and presents the raw data in a clean, tab-delimited format that any spreadsheet application or data tool can immediately import and process.

TSV is often preferred over CSV for wiki data extraction because wiki content frequently contains commas within cell values (descriptions, lists, addresses), which would require complex quoting in CSV. Tab characters, on the other hand, rarely appear in natural text, making TSV a cleaner choice for data that originates from human-written wiki content. The resulting files import cleanly into Excel, Google Sheets, and database tools without column misalignment issues.

This conversion is particularly valuable for researchers, data analysts, and content managers who need to work with data published on Wikipedia or internal wikis. Rather than manually copying table data cell by cell, converting the entire MediaWiki source to TSV automates the extraction process and preserves all rows and columns accurately, including header information.

Key Benefits of Converting MediaWiki to TSV:

  • Data Extraction: Pull tabular data from wiki pages into analysis-ready format
  • Spreadsheet Import: Open directly in Excel, Google Sheets, and LibreOffice Calc
  • Database Loading: Import wiki data into SQL databases with bulk load operations
  • No Quoting Issues: TSV avoids the comma conflicts common in wiki text
  • Data Analysis: Process wiki data with Python, R, or other analytical tools
  • Clean Output: Strips all wiki formatting to reveal pure tabular data
  • Batch Processing: Convert multiple wiki pages with tables in one operation

Practical Examples

Example 1: Wikipedia Data Table Extraction

Input MediaWiki file (countries.mediawiki):

== Countries by Population ==

{| class="wikitable sortable"
|-
! Country !! Population !! Capital !! Continent
|-
| [[China]] || 1,412,000,000 || Beijing || Asia
|-
| [[India]] || 1,408,000,000 || New Delhi || Asia
|-
| [[United States]] || 334,000,000 || Washington, D.C. || North America
|-
| [[Brazil]] || 216,000,000 || Brasilia || South America
|}

Output TSV file (countries.tsv):

Country	Population	Capital	Continent
China	1,412,000,000	Beijing	Asia
India	1,408,000,000	New Delhi	Asia
United States	334,000,000	Washington, D.C.	North America
Brazil	216,000,000	Brasilia	South America

Example 2: Software Comparison Table

Input MediaWiki file (comparison.mediawiki):

== Database Comparison ==

{| class="wikitable"
|-
! Database !! Type !! License !! Latest Version
|-
| '''PostgreSQL''' || Relational || Open Source || 16.2
|-
| '''MySQL''' || Relational || GPL/Commercial || 8.3
|-
| '''MongoDB''' || Document || SSPL || 7.0
|-
| '''Redis''' || Key-Value || BSD || 7.2
|}

Output TSV file (comparison.tsv):

Database	Type	License	Latest Version
PostgreSQL	Relational	Open Source	16.2
MySQL	Relational	GPL/Commercial	8.3
MongoDB	Document	SSPL	7.0
Redis	Key-Value	BSD	7.2

Example 3: Project Status Tracking

Input MediaWiki file (status.mediawiki):

== Sprint Tasks ==

{| class="wikitable"
|-
! Task !! Assignee !! Priority !! Status
|-
| Implement login API || Alice || High || Complete
|-
| Design dashboard || Bob || Medium || In Progress
|-
| Write unit tests || Carol || High || Pending
|}

{{Note|Sprint ends on Friday.}}

Output TSV file (status.tsv):

Task	Assignee	Priority	Status
Implement login API	Alice	High	Complete
Design dashboard	Bob	Medium	In Progress
Write unit tests	Carol	High	Pending

Frequently Asked Questions (FAQ)

Q: What is TSV format?

A: TSV (Tab-Separated Values) is a simple text format for storing tabular data. Each row is a line of text, and columns within a row are separated by tab characters. It is similar to CSV but uses tabs instead of commas as delimiters, which makes it simpler to handle data that contains commas. TSV files can be opened in any spreadsheet application or text editor.

Q: How are MediaWiki tables extracted into TSV?

A: The converter parses MediaWiki table syntax ({| ... |}), identifies header rows (! cells) and data rows (| cells), strips all wiki markup formatting, and outputs each cell value separated by tab characters. Multiple tables in a single wiki page can be extracted into separate sections or files.

Q: Why use TSV instead of CSV for wiki data?

A: Wiki content frequently contains commas in cell values (city names like "Washington, D.C.", lists, descriptions). Using TSV avoids the need for complex quoting rules that CSV requires for comma-containing data. Since tab characters rarely appear in natural wiki text, TSV provides cleaner, more reliable data extraction from wiki tables.

Q: Can I open TSV files in Excel?

A: Yes! Microsoft Excel, Google Sheets, LibreOffice Calc, and most other spreadsheet applications can open TSV files directly. In Excel, you can use File > Open and select the TSV file, or use the Text Import Wizard to specify tab as the delimiter. The data will be automatically arranged into columns.

Q: What happens to non-table content in the MediaWiki file?

A: The converter focuses on extracting tabular data. Non-table content such as headings, paragraphs, lists, and plain text is converted into a structured text representation with appropriate tab separation. Section headings may be preserved as context markers, and list items can be represented as individual rows.

Q: Is wiki formatting stripped from cell values?

A: Yes, all MediaWiki markup is removed from cell values during conversion. Bold markers (''' '''), italic markers ('' ''), link syntax ([[...]]), and other wiki formatting are stripped, leaving only the plain text content. This ensures clean data that can be processed by spreadsheet formulas and database queries.

Q: Can I import TSV data into a database?

A: Absolutely! TSV is one of the standard formats for bulk data import in most database systems. PostgreSQL (COPY command), MySQL (LOAD DATA INFILE), SQLite, and other databases support direct TSV import. This makes it easy to load wiki table data into a database for querying and analysis.

Q: Can I convert multiple MediaWiki files to TSV at once?

A: Yes! Our converter supports batch processing. Upload multiple MediaWiki files simultaneously and each will be converted to its own TSV file. This is ideal for extracting tabular data from multiple Wikipedia articles or internal wiki pages in a single operation.