Convert MediaWiki to TSV
Max file size 100mb.
MediaWiki vs TSV Format Comparison
| Aspect | MediaWiki (Source Format) | TSV (Target Format) |
|---|---|---|
| Format Overview |
MediaWiki
MediaWiki Markup Language
Lightweight markup language created for Wikipedia in 2002 and used by all MediaWiki-powered wikis. Uses distinctive syntax with == headings ==, '''bold''', ''italic'', [[links]], and {| tables |} for collaborative web content creation and editing. Wiki Markup Plain Text |
TSV
Tab-Separated Values
Simple tabular data format where columns are separated by tab characters and rows by newlines. Widely used for data exchange between spreadsheets, databases, and statistical software. Simpler than CSV as tab characters rarely appear in data, reducing the need for quoting and escaping. Tabular Data Plain Text |
| Technical Specifications |
Structure: Plain text with wiki markup
Encoding: UTF-8 Format: Text-based markup language Compression: None (plain text) Extensions: .mediawiki, .wiki, .txt |
Structure: Tab-delimited rows and columns
Encoding: UTF-8 or ASCII Format: Flat tabular data Compression: None (plain text) Extensions: .tsv, .tab, .txt |
| Syntax Examples |
MediaWiki uses wiki-style tables: {| class="wikitable"
|-
! Name !! Age !! City
|-
| Alice || 30 || New York
|-
| Bob || 25 || London
|}
|
TSV uses tabs between columns: Name Age City Alice 30 New York Bob 25 London |
| Content Support |
|
|
| Advantages |
|
|
| Disadvantages |
|
|
| Common Uses |
|
|
| Best For |
|
|
| Version History |
Introduced: 2002 (MediaWiki 1.0)
Current Version: MediaWiki 1.42 (2024) Status: Actively maintained and developed Evolution: Regular updates with new features |
Introduced: Early computing era
Standard: IANA media type: text/tab-separated-values Status: Stable, universally supported Evolution: De facto standard, no formal versioning |
| Software Support |
MediaWiki: Native rendering engine
Wikipedia: Primary content format Pandoc: Full conversion support Other: Any text editor for source editing |
Microsoft Excel: Full import/export support
Google Sheets: Native open support LibreOffice Calc: Full support Other: Python, R, databases, all data tools |
Why Convert MediaWiki to TSV?
Converting MediaWiki markup to TSV format is essential when you need to extract tabular data from wiki pages for analysis, import into spreadsheets, or load into databases. Wikipedia and MediaWiki-based wikis contain vast amounts of structured data in wiki tables, and converting these to TSV makes the data accessible to standard data processing tools like Excel, Google Sheets, Python pandas, and SQL databases.
MediaWiki tables use a complex markup syntax with {| for table start, |- for row separators, ! for header cells, and || for cell separators. While this syntax renders beautifully in a web browser, it is not suitable for data analysis or programmatic processing. TSV format strips away all wiki markup and presents the raw data in a clean, tab-delimited format that any spreadsheet application or data tool can immediately import and process.
TSV is often preferred over CSV for wiki data extraction because wiki content frequently contains commas within cell values (descriptions, lists, addresses), which would require complex quoting in CSV. Tab characters, on the other hand, rarely appear in natural text, making TSV a cleaner choice for data that originates from human-written wiki content. The resulting files import cleanly into Excel, Google Sheets, and database tools without column misalignment issues.
This conversion is particularly valuable for researchers, data analysts, and content managers who need to work with data published on Wikipedia or internal wikis. Rather than manually copying table data cell by cell, converting the entire MediaWiki source to TSV automates the extraction process and preserves all rows and columns accurately, including header information.
Key Benefits of Converting MediaWiki to TSV:
- Data Extraction: Pull tabular data from wiki pages into analysis-ready format
- Spreadsheet Import: Open directly in Excel, Google Sheets, and LibreOffice Calc
- Database Loading: Import wiki data into SQL databases with bulk load operations
- No Quoting Issues: TSV avoids the comma conflicts common in wiki text
- Data Analysis: Process wiki data with Python, R, or other analytical tools
- Clean Output: Strips all wiki formatting to reveal pure tabular data
- Batch Processing: Convert multiple wiki pages with tables in one operation
Practical Examples
Example 1: Wikipedia Data Table Extraction
Input MediaWiki file (countries.mediawiki):
== Countries by Population ==
{| class="wikitable sortable"
|-
! Country !! Population !! Capital !! Continent
|-
| [[China]] || 1,412,000,000 || Beijing || Asia
|-
| [[India]] || 1,408,000,000 || New Delhi || Asia
|-
| [[United States]] || 334,000,000 || Washington, D.C. || North America
|-
| [[Brazil]] || 216,000,000 || Brasilia || South America
|}
Output TSV file (countries.tsv):
Country Population Capital Continent China 1,412,000,000 Beijing Asia India 1,408,000,000 New Delhi Asia United States 334,000,000 Washington, D.C. North America Brazil 216,000,000 Brasilia South America
Example 2: Software Comparison Table
Input MediaWiki file (comparison.mediawiki):
== Database Comparison ==
{| class="wikitable"
|-
! Database !! Type !! License !! Latest Version
|-
| '''PostgreSQL''' || Relational || Open Source || 16.2
|-
| '''MySQL''' || Relational || GPL/Commercial || 8.3
|-
| '''MongoDB''' || Document || SSPL || 7.0
|-
| '''Redis''' || Key-Value || BSD || 7.2
|}
Output TSV file (comparison.tsv):
Database Type License Latest Version PostgreSQL Relational Open Source 16.2 MySQL Relational GPL/Commercial 8.3 MongoDB Document SSPL 7.0 Redis Key-Value BSD 7.2
Example 3: Project Status Tracking
Input MediaWiki file (status.mediawiki):
== Sprint Tasks ==
{| class="wikitable"
|-
! Task !! Assignee !! Priority !! Status
|-
| Implement login API || Alice || High || Complete
|-
| Design dashboard || Bob || Medium || In Progress
|-
| Write unit tests || Carol || High || Pending
|}
{{Note|Sprint ends on Friday.}}
Output TSV file (status.tsv):
Task Assignee Priority Status Implement login API Alice High Complete Design dashboard Bob Medium In Progress Write unit tests Carol High Pending
Frequently Asked Questions (FAQ)
Q: What is TSV format?
A: TSV (Tab-Separated Values) is a simple text format for storing tabular data. Each row is a line of text, and columns within a row are separated by tab characters. It is similar to CSV but uses tabs instead of commas as delimiters, which makes it simpler to handle data that contains commas. TSV files can be opened in any spreadsheet application or text editor.
Q: How are MediaWiki tables extracted into TSV?
A: The converter parses MediaWiki table syntax ({| ... |}), identifies header rows (! cells) and data rows (| cells), strips all wiki markup formatting, and outputs each cell value separated by tab characters. Multiple tables in a single wiki page can be extracted into separate sections or files.
Q: Why use TSV instead of CSV for wiki data?
A: Wiki content frequently contains commas in cell values (city names like "Washington, D.C.", lists, descriptions). Using TSV avoids the need for complex quoting rules that CSV requires for comma-containing data. Since tab characters rarely appear in natural wiki text, TSV provides cleaner, more reliable data extraction from wiki tables.
Q: Can I open TSV files in Excel?
A: Yes! Microsoft Excel, Google Sheets, LibreOffice Calc, and most other spreadsheet applications can open TSV files directly. In Excel, you can use File > Open and select the TSV file, or use the Text Import Wizard to specify tab as the delimiter. The data will be automatically arranged into columns.
Q: What happens to non-table content in the MediaWiki file?
A: The converter focuses on extracting tabular data. Non-table content such as headings, paragraphs, lists, and plain text is converted into a structured text representation with appropriate tab separation. Section headings may be preserved as context markers, and list items can be represented as individual rows.
Q: Is wiki formatting stripped from cell values?
A: Yes, all MediaWiki markup is removed from cell values during conversion. Bold markers (''' '''), italic markers ('' ''), link syntax ([[...]]), and other wiki formatting are stripped, leaving only the plain text content. This ensures clean data that can be processed by spreadsheet formulas and database queries.
Q: Can I import TSV data into a database?
A: Absolutely! TSV is one of the standard formats for bulk data import in most database systems. PostgreSQL (COPY command), MySQL (LOAD DATA INFILE), SQLite, and other databases support direct TSV import. This makes it easy to load wiki table data into a database for querying and analysis.
Q: Can I convert multiple MediaWiki files to TSV at once?
A: Yes! Our converter supports batch processing. Upload multiple MediaWiki files simultaneously and each will be converted to its own TSV file. This is ideal for extracting tabular data from multiple Wikipedia articles or internal wiki pages in a single operation.