Convert TSV to HEX

Drag and drop files here or click to select.
Max file size 100mb.
Uploading progress:

TSV vs HEX Format Comparison

Aspect TSV (Source Format) HEX (Target Format)
Format Overview
TSV
Tab-Separated Values

Plain text format for storing tabular data where columns are separated by tab characters. Clipboard-native and widely used in bioinformatics, genomics, and data pipelines. Simpler than CSV because tabs rarely appear in data, eliminating quoting issues entirely.

Tabular Data Clipboard-Native
HEX
Hexadecimal Representation

A byte-level representation of file content using hexadecimal notation (base-16). Each byte is displayed as two hex digits (00-FF). Used for debugging, data inspection, binary analysis, and low-level data examination. Essential for understanding file encoding and detecting hidden characters.

Binary Data Debugging
Technical Specifications
Structure: Rows separated by newlines, columns by tabs
Delimiter: Tab character (U+0009)
Encoding: UTF-8, ASCII, or UTF-16
Headers: Optional first row as column names
Extensions: .tsv, .tab
Structure: Offset + hex bytes + ASCII representation
Base: Base-16 (digits 0-9, letters A-F)
Byte Size: 2 hex characters per byte
Layout: Typically 16 bytes per line
Extensions: .hex, .txt
Syntax Examples

TSV uses tab-separated columns:

Name	Age	City
Alice	30	New York
Bob	25	London

HEX shows bytes in hexadecimal:

00000000  4e 61 6d 65 09 41 67 65  |Name.Age|
00000008  09 43 69 74 79 0a 41 6c  |.City.Al|
00000010  69 63 65 09 33 30 09 4e  |ice.30.N|
00000018  65 77 20 59 6f 72 6b 0a  |ew York.|
Content Support
  • Tabular data with rows and columns
  • Text, numbers, and dates
  • No quoting needed (tabs rarely in data)
  • Direct clipboard paste support
  • Large datasets (millions of rows)
  • Bioinformatics standard (BED, GFF, VCF)
  • Exact byte-level representation
  • Reveals hidden/control characters
  • Shows encoding (UTF-8 BOM, line endings)
  • Offset addressing for byte location
  • ASCII sidebar for human readability
  • Works with any file type or encoding
Advantages
  • No quoting issues (tabs rarely appear in data)
  • Clipboard-native: paste directly from spreadsheets
  • Standard in bioinformatics and genomics
  • Simpler parsing than CSV
  • Human-readable in any text editor
  • Minimal overhead and tiny file size
  • Reveals exact byte content of any file
  • Essential for debugging encoding issues
  • Shows tab characters (09) and line endings
  • Detects BOM markers and hidden characters
  • Universal -- works with any file format
  • Critical tool for data forensics
Disadvantages
  • No formatting or styling
  • No data types (everything is text)
  • Tab characters in data can break parsing
  • No multi-sheet support
  • No metadata or schema definition
  • Not human-readable for data content
  • File size increases significantly (~3x)
  • No semantic structure or meaning
  • Requires hex knowledge to interpret
  • Not suitable for data exchange
Common Uses
  • Bioinformatics data exchange (BED, GFF)
  • Clipboard copy/paste between apps
  • Database exports and imports
  • Scientific data tables
  • Log file analysis and processing
  • Debugging file encoding issues
  • Inspecting binary file headers
  • Verifying data integrity
  • Forensic data analysis
  • Detecting hidden characters
  • Reverse engineering file formats
Best For
  • Clipboard-based data transfer
  • Bioinformatics and genomics workflows
  • Simple tabular data without quoting hassles
  • Unix/Linux data processing pipelines
  • Debugging TSV encoding problems
  • Verifying tab delimiters are correct
  • Inspecting non-printable characters
  • Low-level data analysis
Version History
Origin: Early Unix tools (cut, paste, awk)
IANA Registration: text/tab-separated-values
Status: Widely used, stable
MIME Type: text/tab-separated-values
Origin: Early computing (hex dumps since 1960s)
Tools: xxd, hexdump, od (Unix)
Status: Fundamental computing tool
Related: Intel HEX, Motorola S-Record
Software Support
Spreadsheets: Excel, Google Sheets, LibreOffice
Text Editors: Any text editor (Notepad, VS Code)
Programming: Python (csv module), R, pandas
Other: Unix tools (cut, awk, sort), databases
CLI Tools: xxd, hexdump, od (Unix/Linux/macOS)
Editors: HxD, Hex Fiend, 010 Editor
IDEs: VS Code (Hex Editor extension)
Programming: Python (binascii), any language

Why Convert TSV to HEX?

Converting TSV to hexadecimal representation reveals the exact byte-level content of your tab-delimited data. This is invaluable for debugging encoding issues, verifying that tab characters are correctly placed, detecting hidden BOM (Byte Order Mark) characters, and inspecting line ending formats (CR, LF, or CRLF). When a TSV file behaves unexpectedly, a hex dump is often the fastest way to identify the root cause.

TSV files are particularly interesting to inspect in hex because you can clearly see the tab delimiter (byte 09) separating each column. This makes it easy to verify that your TSV file is correctly formatted -- you can confirm that delimiters are actual tab characters and not spaces, that line endings are consistent, and that the encoding is what you expect (UTF-8, ASCII, or UTF-16).

In bioinformatics, where TSV is the standard data format, hex inspection is crucial for troubleshooting data pipeline issues. When a genomic data file fails to parse, a hex dump can reveal invisible characters, wrong line endings from Windows/Unix conversion, or encoding mismatches. Our converter produces a clean hex dump with offset addresses and ASCII sidebar for easy analysis.

The hex output follows the standard hex dump format used by tools like xxd and hexdump: each line shows the byte offset, 16 bytes in hexadecimal, and the ASCII representation of those bytes. Non-printable characters (including the tab delimiter 0x09) are shown as dots in the ASCII column, making them easy to spot.

Key Benefits of Converting TSV to HEX:

  • Debug Encoding: Identify UTF-8 BOM, encoding issues, and character set problems
  • Verify Delimiters: Confirm tabs (0x09) are used, not spaces or other characters
  • Inspect Line Endings: Detect CR (0x0D), LF (0x0A), or CRLF differences
  • Find Hidden Characters: Reveal zero-width spaces, non-breaking spaces, and control chars
  • Byte-Level View: See the exact binary content of every cell
  • Standard Format: Output compatible with xxd, hexdump, and hex editors
  • Data Forensics: Essential for investigating parsing failures and data corruption

Practical Examples

Example 1: Debugging Tab Delimiters

Input TSV file (data.tsv):

ID	Name	Value
1	Alpha	100
2	Beta	200

Output HEX file (data.hex):

00000000  49 44 09 4e 61 6d 65 09  56 61 6c 75 65 0a 31 09  |ID.Name.Value.1.|
00000010  41 6c 70 68 61 09 31 30  30 0a 32 09 42 65 74 61  |Alpha.100.2.Beta|
00000020  09 32 30 30 0a                                    |.200.|

Tab characters visible as 09, line feeds as 0a.

Example 2: Detecting BOM and Encoding

Input TSV file (utf8_bom.tsv):

(UTF-8 BOM file with content:)
Name	City
Anna	Zurich

Output HEX file (utf8_bom.hex):

00000000  ef bb bf 4e 61 6d 65 09  43 69 74 79 0a 41 6e 6e  |...Name.City.Ann|
00000010  61 09 5a c3 bc 72 69 63  68 0a                    |a.Z..rich.|

BOM detected: ef bb bf (UTF-8 BOM)
UTF-8 umlaut u: c3 bc (two bytes for the character 'u' with umlaut)

Example 3: Windows vs Unix Line Endings

Input TSV file (windows.tsv):

(Windows line endings - CRLF:)
A	B
1	2

Output HEX file (windows.hex):

00000000  41 09 42 0d 0a 31 09 32  0d 0a                    |A.B..1.2..|

Windows CRLF line endings visible as 0d 0a (CR+LF).
Unix would show only 0a (LF).
Tab delimiters clearly visible as 09.

Frequently Asked Questions (FAQ)

Q: What is a hex dump?

A: A hex dump is a representation of a file's binary content using hexadecimal notation. Each byte is shown as two hex digits (00-FF). A typical hex dump shows the byte offset on the left, hex bytes in the middle, and ASCII interpretation on the right. This format is used by tools like xxd, hexdump, and hex editors to inspect files at the byte level.

Q: Why would I need to convert TSV to HEX?

A: The most common reason is debugging. When a TSV file won't parse correctly, a hex dump reveals the exact bytes in the file. You can verify that tab characters (0x09) are actual tabs and not spaces, check line endings (LF vs CRLF), detect BOM markers, and find hidden characters that are invisible in text editors. This is especially important in bioinformatics pipelines where data integrity is critical.

Q: How do I identify tab characters in the hex output?

A: Tab characters appear as the hex byte 09 in the output. In the ASCII column on the right side of the hex dump, tabs are typically shown as dots (.) since they are non-printable characters. If you see spaces (hex 20) where you expect tabs, your file may not be a valid TSV file.

Q: What does BOM look like in hex?

A: The UTF-8 BOM (Byte Order Mark) appears as the three bytes EF BB BF at the very beginning of the file. UTF-16 Little Endian BOM is FF FE, and UTF-16 Big Endian is FE FF. BOMs can cause parsing issues in TSV files if the parser doesn't handle them, and a hex dump is the quickest way to detect them.

Q: How can I tell if my TSV has Windows or Unix line endings?

A: In the hex output, Unix line endings appear as 0A (LF - Line Feed), while Windows line endings appear as 0D 0A (CR+LF - Carriage Return + Line Feed). Old Mac line endings use 0D (CR only). Mixed line endings in a single file are a common source of parsing problems, and hex dumps make them easy to identify.

Q: Is the hex output reversible back to TSV?

A: Yes! A hex dump can be converted back to the original binary file using tools like xxd -r (reverse). The hex representation is a lossless encoding of the original data. However, the primary purpose of TSV to HEX conversion is inspection and debugging, not data transformation.

Q: Why does the hex output show dots in the ASCII column?

A: The ASCII column shows a dot (.) for any byte that is not a printable ASCII character (outside the range 0x20-0x7E). This includes tab characters (0x09), line feeds (0x0A), carriage returns (0x0D), null bytes (0x00), and any byte above 0x7E (such as UTF-8 multi-byte sequences). This convention makes it easy to spot non-printable characters.

Q: How does hex dump help with bioinformatics TSV files?

A: Bioinformatics TSV files (BED, GFF, VCF) often pass through multiple processing steps on different operating systems. Hex dumps help identify encoding changes, line ending conversions, and invisible characters introduced during data transfer. When a genomic data pipeline fails, a hex dump of the input TSV is often the first diagnostic step to identify the issue.