Convert TSV to HEX
Max file size 100mb.
TSV vs HEX Format Comparison
| Aspect | TSV (Source Format) | HEX (Target Format) |
|---|---|---|
| Format Overview |
TSV
Tab-Separated Values
Plain text format for storing tabular data where columns are separated by tab characters. Clipboard-native and widely used in bioinformatics, genomics, and data pipelines. Simpler than CSV because tabs rarely appear in data, eliminating quoting issues entirely. Tabular Data Clipboard-Native |
HEX
Hexadecimal Representation
A byte-level representation of file content using hexadecimal notation (base-16). Each byte is displayed as two hex digits (00-FF). Used for debugging, data inspection, binary analysis, and low-level data examination. Essential for understanding file encoding and detecting hidden characters. Binary Data Debugging |
| Technical Specifications |
Structure: Rows separated by newlines, columns by tabs
Delimiter: Tab character (U+0009) Encoding: UTF-8, ASCII, or UTF-16 Headers: Optional first row as column names Extensions: .tsv, .tab |
Structure: Offset + hex bytes + ASCII representation
Base: Base-16 (digits 0-9, letters A-F) Byte Size: 2 hex characters per byte Layout: Typically 16 bytes per line Extensions: .hex, .txt |
| Syntax Examples |
TSV uses tab-separated columns: Name Age City Alice 30 New York Bob 25 London |
HEX shows bytes in hexadecimal: 00000000 4e 61 6d 65 09 41 67 65 |Name.Age| 00000008 09 43 69 74 79 0a 41 6c |.City.Al| 00000010 69 63 65 09 33 30 09 4e |ice.30.N| 00000018 65 77 20 59 6f 72 6b 0a |ew York.| |
| Content Support |
|
|
| Advantages |
|
|
| Disadvantages |
|
|
| Common Uses |
|
|
| Best For |
|
|
| Version History |
Origin: Early Unix tools (cut, paste, awk)
IANA Registration: text/tab-separated-values Status: Widely used, stable MIME Type: text/tab-separated-values |
Origin: Early computing (hex dumps since 1960s)
Tools: xxd, hexdump, od (Unix) Status: Fundamental computing tool Related: Intel HEX, Motorola S-Record |
| Software Support |
Spreadsheets: Excel, Google Sheets, LibreOffice
Text Editors: Any text editor (Notepad, VS Code) Programming: Python (csv module), R, pandas Other: Unix tools (cut, awk, sort), databases |
CLI Tools: xxd, hexdump, od (Unix/Linux/macOS)
Editors: HxD, Hex Fiend, 010 Editor IDEs: VS Code (Hex Editor extension) Programming: Python (binascii), any language |
Why Convert TSV to HEX?
Converting TSV to hexadecimal representation reveals the exact byte-level content of your tab-delimited data. This is invaluable for debugging encoding issues, verifying that tab characters are correctly placed, detecting hidden BOM (Byte Order Mark) characters, and inspecting line ending formats (CR, LF, or CRLF). When a TSV file behaves unexpectedly, a hex dump is often the fastest way to identify the root cause.
TSV files are particularly interesting to inspect in hex because you can clearly see the tab delimiter (byte 09) separating each column. This makes it easy to verify that your TSV file is correctly formatted -- you can confirm that delimiters are actual tab characters and not spaces, that line endings are consistent, and that the encoding is what you expect (UTF-8, ASCII, or UTF-16).
In bioinformatics, where TSV is the standard data format, hex inspection is crucial for troubleshooting data pipeline issues. When a genomic data file fails to parse, a hex dump can reveal invisible characters, wrong line endings from Windows/Unix conversion, or encoding mismatches. Our converter produces a clean hex dump with offset addresses and ASCII sidebar for easy analysis.
The hex output follows the standard hex dump format used by tools like xxd and hexdump: each line shows the byte offset, 16 bytes in hexadecimal, and the ASCII representation of those bytes. Non-printable characters (including the tab delimiter 0x09) are shown as dots in the ASCII column, making them easy to spot.
Key Benefits of Converting TSV to HEX:
- Debug Encoding: Identify UTF-8 BOM, encoding issues, and character set problems
- Verify Delimiters: Confirm tabs (0x09) are used, not spaces or other characters
- Inspect Line Endings: Detect CR (0x0D), LF (0x0A), or CRLF differences
- Find Hidden Characters: Reveal zero-width spaces, non-breaking spaces, and control chars
- Byte-Level View: See the exact binary content of every cell
- Standard Format: Output compatible with xxd, hexdump, and hex editors
- Data Forensics: Essential for investigating parsing failures and data corruption
Practical Examples
Example 1: Debugging Tab Delimiters
Input TSV file (data.tsv):
ID Name Value 1 Alpha 100 2 Beta 200
Output HEX file (data.hex):
00000000 49 44 09 4e 61 6d 65 09 56 61 6c 75 65 0a 31 09 |ID.Name.Value.1.| 00000010 41 6c 70 68 61 09 31 30 30 0a 32 09 42 65 74 61 |Alpha.100.2.Beta| 00000020 09 32 30 30 0a |.200.| Tab characters visible as 09, line feeds as 0a.
Example 2: Detecting BOM and Encoding
Input TSV file (utf8_bom.tsv):
(UTF-8 BOM file with content:) Name City Anna Zurich
Output HEX file (utf8_bom.hex):
00000000 ef bb bf 4e 61 6d 65 09 43 69 74 79 0a 41 6e 6e |...Name.City.Ann| 00000010 61 09 5a c3 bc 72 69 63 68 0a |a.Z..rich.| BOM detected: ef bb bf (UTF-8 BOM) UTF-8 umlaut u: c3 bc (two bytes for the character 'u' with umlaut)
Example 3: Windows vs Unix Line Endings
Input TSV file (windows.tsv):
(Windows line endings - CRLF:) A B 1 2
Output HEX file (windows.hex):
00000000 41 09 42 0d 0a 31 09 32 0d 0a |A.B..1.2..| Windows CRLF line endings visible as 0d 0a (CR+LF). Unix would show only 0a (LF). Tab delimiters clearly visible as 09.
Frequently Asked Questions (FAQ)
Q: What is a hex dump?
A: A hex dump is a representation of a file's binary content using hexadecimal notation. Each byte is shown as two hex digits (00-FF). A typical hex dump shows the byte offset on the left, hex bytes in the middle, and ASCII interpretation on the right. This format is used by tools like xxd, hexdump, and hex editors to inspect files at the byte level.
Q: Why would I need to convert TSV to HEX?
A: The most common reason is debugging. When a TSV file won't parse correctly, a hex dump reveals the exact bytes in the file. You can verify that tab characters (0x09) are actual tabs and not spaces, check line endings (LF vs CRLF), detect BOM markers, and find hidden characters that are invisible in text editors. This is especially important in bioinformatics pipelines where data integrity is critical.
Q: How do I identify tab characters in the hex output?
A: Tab characters appear as the hex byte 09 in the output. In the ASCII column on the right side of the hex dump, tabs are typically shown as dots (.) since they are non-printable characters. If you see spaces (hex 20) where you expect tabs, your file may not be a valid TSV file.
Q: What does BOM look like in hex?
A: The UTF-8 BOM (Byte Order Mark) appears as the three bytes EF BB BF at the very beginning of the file. UTF-16 Little Endian BOM is FF FE, and UTF-16 Big Endian is FE FF. BOMs can cause parsing issues in TSV files if the parser doesn't handle them, and a hex dump is the quickest way to detect them.
Q: How can I tell if my TSV has Windows or Unix line endings?
A: In the hex output, Unix line endings appear as 0A (LF - Line Feed), while Windows line endings appear as 0D 0A (CR+LF - Carriage Return + Line Feed). Old Mac line endings use 0D (CR only). Mixed line endings in a single file are a common source of parsing problems, and hex dumps make them easy to identify.
Q: Is the hex output reversible back to TSV?
A: Yes! A hex dump can be converted back to the original binary file using tools like xxd -r (reverse). The hex representation is a lossless encoding of the original data. However, the primary purpose of TSV to HEX conversion is inspection and debugging, not data transformation.
Q: Why does the hex output show dots in the ASCII column?
A: The ASCII column shows a dot (.) for any byte that is not a printable ASCII character (outside the range 0x20-0x7E). This includes tab characters (0x09), line feeds (0x0A), carriage returns (0x0D), null bytes (0x00), and any byte above 0x7E (such as UTF-8 multi-byte sequences). This convention makes it easy to spot non-printable characters.
Q: How does hex dump help with bioinformatics TSV files?
A: Bioinformatics TSV files (BED, GFF, VCF) often pass through multiple processing steps on different operating systems. Hex dumps help identify encoding changes, line ending conversions, and invisible characters introduced during data transfer. When a genomic data pipeline fails, a hex dump of the input TSV is often the first diagnostic step to identify the issue.