XZ (LZMA2) Format Guide

Available Conversions

About XZ (LZMA2) Format

XZ is a high-ratio lossless compression format developed by Lasse Collin as part of the XZ Utils project (formerly LZMA Utils). Released in 2009, XZ uses the LZMA2 compression algorithm — an improved version of Igor Pavlov's LZMA algorithm from the 7-Zip project. XZ achieves the best compression ratios among standard Unix compression tools, typically producing files 20–40% smaller than gzip and 5–15% smaller than bzip2. The format has become the de facto standard for distributing Linux kernel sources, software packages, and large datasets across the open-source ecosystem.

History of XZ

The XZ format traces its roots to Igor Pavlov's LZMA SDK, first released in 2001 as part of the 7-Zip project. In 2004, Lasse Collin created LZMA Utils — a set of command-line tools that brought LZMA compression to Unix/Linux systems. By 2009, Collin redesigned the container format and improved the algorithm, releasing XZ Utils with the new .xz format. The key innovation was LZMA2, which added support for multi-threaded compression, better handling of incompressible data, and a more robust container format with CRC-64 integrity checking. The Linux kernel project adopted .tar.xz as its primary distribution format, followed by major Linux distributions including Fedora, Arch Linux, and Debian. XZ quickly replaced bzip2 as the preferred high-ratio compression tool in the Unix world. In 2024, the project gained attention when a supply-chain attack was discovered in XZ Utils 5.6.0/5.6.1, which was quickly addressed — demonstrating the importance of the project to the open-source infrastructure.

Key Features and Uses

XZ's LZMA2 algorithm uses a combination of LZ77 dictionary compression with large dictionary sizes (up to 1.5 GiB) and range coding for entropy encoding. This produces excellent compression ratios at the cost of slower speed and higher memory usage compared to gzip. The format supports filter chains — most notably BCJ (Branch/Call/Jump) filters that transform executable code to improve compression of binary files. XZ natively supports multi-threaded compression (xz -T0), block-based processing for parallel decompression, and stream concatenation. The container format includes CRC-64 checksums for integrity verification and supports optional SHA-256 hashes. XZ is designed for the Unix philosophy of composability — it compresses a single stream and is typically combined with tar to create .tar.xz archives.

Common Applications

XZ is the standard compression format for the Linux kernel source code (kernel.org distributes as .tar.xz), software package repositories (Arch Linux pacman packages, Fedora/RHEL source RPMs, Debian .deb components), and GNU project source releases. It is widely used for compressing large datasets in scientific computing, database backups where storage efficiency is paramount, and open-source project release tarballs. The Python standard library includes the lzma module for XZ/LZMA compression, and liblzma provides C API access. XZ is also used in embedded systems firmware distribution where minimizing image size reduces flash write time and storage requirements.

Advantages and Disadvantages

Advantages

  • Best Compression Ratio: 20–40% smaller than gzip, 5–15% smaller than bzip2
  • Open Source: Free, public domain/GPL, no licensing restrictions
  • Multi-threaded: Native parallel compression with xz -T0
  • Strong Integrity: CRC-64 and optional SHA-256 checksums
  • BCJ Filters: Improved compression for executable binaries
  • Linux Standard: Used by kernel.org, GNU, and major distributions
  • Block-based: LZMA2 enables parallel processing
  • Large Dictionary: Up to 1.5 GiB for maximum compression
  • Python Built-in: lzma module in Python standard library

Disadvantages

  • Slow Speed: 5–20x slower compression than gzip, 2–5x slower decompression
  • High Memory: Up to 674 MB RAM at maximum compression level
  • Single File Only: Cannot archive directories — must combine with tar
  • No Encryption: No built-in password protection or encryption
  • No Recovery: CRC detects corruption but cannot repair it
  • No Windows Native: Requires 7-Zip or similar tool on Windows
  • No Random Access: Sequential decompression only (within blocks)
  • No Multi-volume: Cannot split into volume files natively