TGZ (TAR.GZ) Format Guide

Available Conversions

About TGZ (TAR.GZ) Format

TGZ (TAR.GZ) is a tarball compressed with gzip — the most common archive format on Linux and Unix systems. The format combines two complementary tools: TAR (Tape Archive, created in 1979 at Bell Labs) bundles files and directories into a single sequential stream while preserving Unix permissions, ownership, timestamps, and symbolic links; gzip (GNU Zip, created in 1992 by Jean-loup Gailly and Mark Adler) then compresses this stream using the DEFLATE algorithm (LZ77 + Huffman coding). The resulting .tar.gz or .tgz file is the de facto standard for distributing source code, Linux packages, system backups, and open-source software across the Unix ecosystem.

History of TGZ

The TAR format was born in 1979 as part of Unix Version 7 at Bell Labs, originally designed to write data to sequential tape drives. When gzip was created in 1992 as a free, patent-unencumbered replacement for the Unix "compress" utility (which used the patented LZW algorithm), the combination of tar + gzip quickly became the standard archive format for the entire Unix world. The .tgz shorthand emerged for systems that preferred single-extension filenames, but .tar.gz and .tgz are identical formats. This combination proved so effective that it remained the dominant Linux archive format for over two decades, and even today — despite the availability of better compressors like xz and zstd — .tar.gz remains the most widely used archive format in the open-source ecosystem due to its universal availability and fast processing speed.

Key Features and Uses

TGZ's design reflects the Unix philosophy of combining simple tools that do one thing well. TAR handles archiving — bundling files, directories, symlinks, and metadata into a single stream — while gzip handles compression. This separation means TGZ archives fully preserve Unix file permissions (rwx), ownership (UID/GID), modification timestamps, symbolic links, and even special file types like device nodes. Gzip provides fast compression with levels from 1 (fastest) to 9 (best compression), defaulting to level 6. The format supports streaming (pipe-friendly), making it ideal for creating archives on-the-fly in shell scripts and CI/CD pipelines. TGZ is the standard format for Python source distributions (sdist), npm packages (internally), Docker image layers, and GitHub release assets.

Common Applications

TGZ is used throughout the computing ecosystem: Linux source code distribution (the Linux kernel was distributed as .tar.gz for decades), system backup and migration (tar -czf preserves all Unix metadata), open-source software releases (GitHub automatically generates .tar.gz for tags), package management (Python sdist, Ruby gems internally, npm packages), Docker and container image layers, CI/CD artifact storage, server configuration backups, and database dump archiving. The format is also standard for transferring directory trees between Unix systems, deploying web applications, and creating reproducible research archives. On the command line, creating a TGZ is as simple as tar -czf archive.tar.gz folder/, and extracting is tar -xzf archive.tar.gz.

Advantages and Disadvantages

Advantages

  • Universal Availability: tar and gzip are installed on every Unix/Linux/macOS system
  • Full Metadata: Preserves permissions, ownership, timestamps, and symlinks
  • Fast Speed: Gzip compression and decompression are extremely fast
  • Solid Compression: Entire archive compressed as one stream for better ratios
  • Streaming Support: Perfect for Unix pipes and automated pipelines
  • Low Memory: Minimal memory requirements for compression and decompression
  • Open Standard: TAR is POSIX standard, gzip is RFC 1952
  • Proven Reliable: 30+ years of production use across billions of archives
  • Parallel Option: pigz provides multi-threaded gzip for modern hardware

Disadvantages

  • No Random Access: Cannot extract individual files without decompressing the stream
  • No Encryption: No built-in password protection or encryption
  • No Append: Cannot add files without recompressing the entire archive
  • Two-Layer Complexity: TAR + gzip can confuse non-technical users
  • Windows Support: Not natively supported on older Windows (requires 7-Zip or WSL)
  • Moderate Compression: XZ and zstd achieve 20-40% better compression
  • No Error Recovery: Corruption can make subsequent data unreadable
  • No Multi-Volume: Cannot split into multiple files for size limits