SPX Format Guide

SPX (Speex) is a free, open-source audio codec specifically designed for speech compression. Developed by Jean-Marc Valin under the Xiph.org Foundation, Speex was created to provide a patent-free alternative to proprietary speech codecs used in Voice over IP (VoIP) and other telecommunications applications. The .spx file extension identifies Speex audio files, which are typically encapsulated in the Ogg container format (.ogg or .spx). Speex is optimized for speech signals rather than general-purpose audio, achieving excellent quality at bitrates from 2 to 44 kbps. The codec supports narrowband (8 kHz), wideband (16 kHz), and ultra-wideband (32 kHz) sampling rates, covering the full range of speech telephony requirements. While Speex has been officially superseded by the Opus codec (also from Xiph.org), it remains widely deployed in existing VoIP systems and legacy applications.

History of Speex

Speex was created by Jean-Marc Valin in 2002 while he was a graduate student at the University of Sherbrooke in Quebec, Canada. Valin developed Speex as part of the Xiph.org Foundation's mission to create free, open multimedia standards, complementing the existing Vorbis (music) and Theora (video) codecs. The initial release focused on narrowband speech at 8 kHz, targeting VoIP applications that were growing rapidly in the early 2000s. Wideband and ultra-wideband modes were added in subsequent releases, improving speech quality significantly. Speex quickly gained adoption in open-source VoIP software including Asterisk PBX, Ooh323c, Opal, and Ooh323c SIP stacks. The codec was integrated into numerous commercial and open-source applications, including Skype (early versions), Mumble, TeamSpeak, and various gaming voice chat systems. In 2007, Valin began work on a new codec that would eventually become Opus, designed to handle both speech and music efficiently. In 2012, the IETF standardized Opus as RFC 6716, and Xiph.org officially declared Speex obsolete in favor of Opus. Despite this, Speex continues to be used in many existing systems and remains a part of the Xiph.org codec family.

Key Features and Uses

Speex employs Code-Excited Linear Prediction (CELP) encoding, which models the human vocal tract to achieve efficient speech compression. The codec operates at variable bitrates from 2.15 kbps (narrowband, quality 0) to 44.2 kbps (ultra-wideband, quality 10), with most practical usage between 8 and 24 kbps. Speex supports both Variable Bit Rate (VBR) and Constant Bit Rate (CBR) encoding, with VBR providing better quality-to-size ratios for non-real-time applications. The codec includes built-in acoustic echo cancellation, noise suppression, and automatic gain control features through the SpeexDSP library, making it a complete speech processing toolkit. Speex supports Voice Activity Detection (VAD) and Discontinuous Transmission (DTX), which reduce bandwidth usage during silence periods. The codec's frame size is 20 ms, providing a good balance between latency and compression efficiency for real-time communications. Speex also supports intensity stereo encoding for wideband and ultra-wideband modes.

Common Applications

Speex has been widely deployed in VoIP applications, where its speech-optimized compression and low latency make it ideal for real-time voice communication. The codec is used in open-source PBX systems like Asterisk and FreeSWITCH for internal and external voice calls. Gaming voice chat applications including Mumble and early versions of TeamSpeak used Speex for in-game communication. Speex found significant use in embedded systems and IoT devices where speech compression was needed with minimal computational resources. The codec is used in various recording applications for dictation, voice memos, and interview recordings where speech quality is prioritized over music fidelity. Speex is deployed in conference calling systems and webinar platforms that need efficient narrowband or wideband speech encoding. The SpeexDSP library's echo cancellation and noise suppression features are used independently of the codec in many audio processing applications. While new projects generally adopt Opus instead, Speex remains in active use in legacy VoIP infrastructure, embedded systems with existing Speex implementations, and applications where the simpler Speex decoder is preferred over the more complex Opus decoder.

Advantages and Disadvantages

Advantages

Speech-Optimized: Designed specifically for human voice with CELP encoding
Very Low Bitrates: Usable speech quality from 2 kbps, excellent at 8-15 kbps
Patent-Free: Completely open and free under BSD license
Built-in DSP: Echo cancellation, noise suppression, and gain control included
Low Latency: 20 ms frame size suitable for real-time communications
Voice Activity Detection: VAD and DTX reduce bandwidth during silence
Multiple Sampling Rates: Narrowband, wideband, and ultra-wideband modes
Lightweight Decoder: Low CPU requirements for embedded and mobile systems
Widely Deployed: Extensive existing VoIP and gaming infrastructure

Disadvantages

Officially Obsolete: Xiph.org recommends Opus as the successor codec
Speech Only: Poor quality for music, sound effects, and non-speech audio
Inferior to Opus: Opus provides better quality at all bitrates for speech
Limited Sample Rate: Maximum 32 kHz, insufficient for high-fidelity audio
No Browser Support: Web browsers do not support Speex natively
Declining Adoption: New projects overwhelmingly choose Opus instead
No Lossless Mode: Always lossy compression, unsuitable for archival
Mono/Stereo Only: No multichannel surround sound support