Convert EPUB3 to TSV
Max file size 100mb.
EPUB3 vs TSV Format Comparison
| Aspect | EPUB3 (Source Format) | TSV (Target Format) |
|---|---|---|
| Format Overview |
EPUB3
Electronic Publication 3.0
EPUB3 is the modern e-book standard maintained by the W3C, supporting HTML5, CSS3, JavaScript, MathML, and SVG. It enables rich, interactive digital publications with multimedia content, accessibility features, and responsive layouts across devices. E-Book Standard HTML5-Based |
TSV
Tab-Separated Values
TSV is a simple tabular data format where columns are separated by tab characters and rows by newlines. It is widely used for data exchange between spreadsheet applications, databases, and data processing tools due to its simplicity and broad software support. Tabular Data Plain Text |
| Technical Specifications |
Structure: ZIP container with XHTML5, CSS3, multimedia
Encoding: UTF-8 (required) Format: Open standard based on web technologies Standard: W3C EPUB 3.3 specification Extensions: .epub |
Structure: Tab-delimited rows and columns
Encoding: UTF-8, ASCII, or system encoding Format: Plain text with tab delimiters Standard: IANA text/tab-separated-values Extensions: .tsv, .tab |
| Syntax Examples |
EPUB3 uses XHTML5 content documents: <html xmlns:epub="...">
<head><title>Chapter 1</title></head>
<body>
<section epub:type="chapter">
<h1>Introduction</h1>
<p>Content text here...</p>
</section>
</body>
</html>
|
TSV uses tabs between columns: chapter title content 1 Introduction Content text here... 2 Background More content here... 3 Methods Method descriptions... |
| Content Support |
|
|
| Advantages |
|
|
| Disadvantages |
|
|
| Common Uses |
|
|
| Best For |
|
|
| Version History |
Introduced: 2014 (EPUB 3.0.1)
Based On: EPUB 2.0 (2007), OEB (1999) Current Version: EPUB 3.3 (W3C Recommendation, 2023) Status: Actively maintained by W3C |
Introduced: 1960s–1970s (with early computing)
MIME Type: text/tab-separated-values (IANA) Current Version: No formal version (stable convention) Status: Universally supported standard |
| Software Support |
Readers: Apple Books, Kobo, Calibre, Thorium
Editors: Sigil, Calibre, EPUB-Checker Libraries: epubjs, readium, epub.js Converters: Calibre, Pandoc, Adobe InDesign |
Spreadsheets: Excel, Google Sheets, LibreOffice Calc
Languages: Python (csv module), R, Perl, Java Databases: MySQL LOAD DATA, PostgreSQL COPY Tools: awk, cut, pandas, data.table |
Why Convert EPUB3 to TSV?
Converting EPUB3 e-books to TSV (Tab-Separated Values) format is essential when you need to analyze book content in spreadsheet applications or data processing pipelines. TSV provides a clean tabular representation of book data that opens directly in Excel, Google Sheets, and LibreOffice Calc.
TSV format is preferred over CSV for text-heavy data because tab characters rarely appear in natural text, eliminating the quoting and escaping issues that plague CSV files containing commas in content. This makes TSV particularly suitable for storing book chapter text alongside metadata columns.
This conversion is valuable for publishers analyzing their e-book catalogs, researchers studying text corpora, and data scientists preparing e-book content for analysis. The tabular format makes it easy to sort, filter, and aggregate book data using standard spreadsheet formulas or programming tools.
The converter extracts book metadata and chapter content into a structured table with columns for chapter number, title, content, word count, and other attributes. This tabular representation enables quantitative analysis of book structure and content that would be difficult with the raw EPUB3 format.
Key Benefits of Converting EPUB3 to TSV:
- Spreadsheet Ready: Opens directly in Excel, Google Sheets, and LibreOffice
- No Quoting Issues: Tabs rarely appear in text, avoiding CSV escaping problems
- Data Analysis: Easy to process with pandas, R, and other data tools
- Database Import: Direct import into MySQL, PostgreSQL, and SQLite
- Content Analytics: Analyze word counts, chapter lengths, and text patterns
- Catalog Building: Create structured book catalogs and indexes
- Simple Format: No special parser needed, just split on tabs
Practical Examples
Example 1: Chapter Content as Tabular Data
Input EPUB3 file (book.epub) — chapters:
<section epub:type="chapter"> <h1>The Beginning</h1> <p>Our story starts in a small town.</p> </section> <section epub:type="chapter"> <h1>The Journey</h1> <p>They traveled across the mountains.</p> </section>
Output TSV file (book.tsv):
chapter_num title content word_count 1 The Beginning Our story starts in a small town. 8 2 The Journey They traveled across the mountains. 6
Example 2: Book Metadata Export
Input EPUB3 file (collection.epub) — metadata:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:title>Modern Architecture</dc:title> <dc:creator>Sarah Builder</dc:creator> <dc:language>en</dc:language> <dc:date>2024-06-01</dc:date> <dc:publisher>Design Press</dc:publisher> <dc:identifier>978-0-123456-78-9</dc:identifier> </metadata>
Output TSV file (collection.tsv):
field value title Modern Architecture creator Sarah Builder language en date 2024-06-01 publisher Design Press identifier 978-0-123456-78-9
Example 3: Table of Contents as TSV
Input EPUB3 file (guide.epub) — navigation:
<nav epub:type="toc">
<ol>
<li><a href="intro.xhtml">Introduction</a></li>
<li><a href="ch01.xhtml">Installation</a></li>
<li><a href="ch02.xhtml">Configuration</a></li>
<li><a href="appendix.xhtml">Appendix A</a></li>
</ol>
</nav>
Output TSV file (guide.tsv):
order label href level 1 Introduction intro.xhtml 1 2 Installation ch01.xhtml 1 3 Configuration ch02.xhtml 1 4 Appendix A appendix.xhtml 1
Frequently Asked Questions (FAQ)
Q: What is TSV format?
A: TSV (Tab-Separated Values) is a plain text format for tabular data where columns are separated by tab characters (ASCII 9) and rows by newlines. It is similar to CSV but uses tabs instead of commas, which avoids escaping issues since tabs rarely appear in natural text content.
Q: Why choose TSV over CSV for e-book content?
A: TSV is preferred for e-book content because book text frequently contains commas (which cause issues in CSV), while tabs rarely appear in natural language. TSV avoids the need for quoting and escaping, making the output cleaner and easier to process with simple text tools like awk and cut.
Q: Can I open TSV files in Excel?
A: Yes, Microsoft Excel, Google Sheets, and LibreOffice Calc all support TSV files. In Excel, use File > Open and select the TSV file; the Text Import Wizard will automatically detect tab delimiters. Google Sheets handles TSV files natively when imported.
Q: How is the EPUB3 content organized in the TSV?
A: The converter creates a header row with column names followed by data rows. Typical columns include chapter number, title, content (text), word count, and source file name. Each chapter or section becomes a separate row, making it easy to analyze book structure in a spreadsheet.
Q: What happens to HTML formatting in the content?
A: HTML tags are stripped from the content, leaving clean plain text in the TSV cells. Bold, italic, and other formatting is removed since TSV is a plain text format. Paragraph breaks within content are preserved using spaces or escaped newline characters.
Q: Can I import the TSV into a database?
A: Yes, all major databases support TSV import. MySQL uses LOAD DATA INFILE with tab delimiter, PostgreSQL uses COPY with TSV format option, and SQLite uses .import with tab mode. This makes TSV an excellent intermediate format for loading e-book content into databases.
Q: How are newlines within chapter content handled?
A: Since TSV uses newlines as row delimiters, embedded newlines within content fields are escaped (replaced with \n) or the content is wrapped in appropriate quoting. This ensures that each row represents exactly one data record without breaking the tabular structure.
Q: What is the maximum data size TSV can handle?
A: TSV itself has no size limit -- it is plain text that can grow as large as your storage allows. Practical limits depend on the software reading the TSV: Excel supports about 1 million rows, while tools like pandas, R, and databases handle billions of rows. Large e-book collections convert to manageable TSV sizes.