Convert AAC to SPX

Drag and drop files here or click to select.
Max file size 100mb.
Uploading progress:

AAC vs SPX Format Comparison

Aspect AAC (Source Format) SPX (Target Format)
Format Overview
AAC
Advanced Audio Coding

Advanced Audio Coding (AAC) is a lossy audio codec standardized by ISO/IEC as part of MPEG-2 and MPEG-4 specifications. Developed as the successor to MP3, AAC delivers superior audio quality at equivalent bitrates through improved frequency resolution and more efficient coding of transient signals. It is the default audio format for Apple devices, YouTube, and most streaming platforms.

Lossy Modern
SPX
Speex Speech Codec

Speex is a free, open-source audio codec specifically designed for speech compression. Developed by Jean-Marc Valin under the Xiph.Org Foundation, Speex supports narrowband (8 kHz), wideband (16 kHz), and ultra-wideband (32 kHz) encoding at bitrates from 2 to 44 kbps. It was widely used in VoIP applications before being succeeded by the Opus codec.

Lossy Legacy
Technical Specifications
Sample Rates: 8 kHz – 96 kHz
Bit Rates: 8–529 kbps (CBR/VBR)
Channels: Up to 48 channels (7.1 surround common)
Codec: AAC-LC, HE-AAC, HE-AAC v2
Container: ADTS (.aac), M4A, MP4
Sample Rates: 8 kHz, 16 kHz, 32 kHz
Bit Rates: 2–44 kbps (VBR/CBR/ABR)
Channels: Mono, Stereo
Codec: Speex (CELP-based)
Container: Ogg (.spx)
Audio Encoding

AAC uses modified discrete cosine transform (MDCT) with advanced psychoacoustic modeling for efficient lossy compression:

# Encode to AAC at 256 kbps
ffmpeg -i input.wav -codec:a aac \
  -b:a 256k output.aac

# High-quality AAC with libfdk_aac
ffmpeg -i input.wav -codec:a libfdk_aac \
  -vbr 5 output.m4a

Speex uses Code-Excited Linear Prediction (CELP) optimized for human speech, with built-in voice activity detection and comfort noise generation:

# Encode to Speex wideband
ffmpeg -i input.wav -codec:a libspeex \
  -ar 16000 output.spx

# Speex with quality setting (0-10)
ffmpeg -i input.wav -codec:a libspeex \
  -compression_level 8 output.spx
Audio Features
  • Metadata: MP4/M4A container tags, iTunes metadata
  • Album Art: Embedded cover images in M4A container
  • Gapless Playback: Supported with iTunSMPB atom
  • Streaming: Excellent — HLS, DASH adaptive streaming
  • Surround: Up to 7.1 channels supported
  • DRM: FairPlay DRM in M4P container
  • Metadata: Vorbis comment tags in Ogg container
  • Voice Activity Detection: Built-in VAD for silence suppression
  • Noise Suppression: Integrated acoustic echo cancellation
  • Streaming: Designed for real-time VoIP streaming
  • Surround: Stereo only, no multichannel support
  • Bitrate Control: VBR, CBR, and ABR modes supported
Advantages
  • Better quality than MP3 at equivalent bitrates
  • Default format for Apple ecosystem and YouTube
  • Supports multichannel surround sound up to 48 channels
  • HE-AAC provides excellent quality at very low bitrates
  • Widely supported across platforms and devices
  • Efficient streaming with adaptive bitrate support
  • Extremely low bitrate speech compression (2–44 kbps)
  • Built-in voice activity detection and noise suppression
  • Very low latency suitable for real-time communication
  • Patent-free and open-source (BSD license)
  • Three bandwidth modes: narrowband, wideband, ultra-wideband
  • Integrated acoustic echo cancellation for VoIP
Disadvantages
  • Lossy compression discards audio data permanently
  • Some encoder implementations are patent-encumbered
  • Quality varies significantly between AAC encoders
  • Not ideal for professional audio editing workflows
  • Re-encoding causes cumulative quality degradation
  • Officially obsoleted by Opus codec since 2012
  • Poor quality for music — optimized only for speech
  • Maximum sample rate limited to 32 kHz
  • Limited software support in modern applications
  • Stereo only — no surround sound capability
Common Uses
  • Apple Music and iTunes Store distribution
  • YouTube and streaming platform audio
  • Digital broadcasting (DAB+, DVB)
  • Mobile audio on iOS and Android devices
  • Video soundtracks in MP4 containers
  • VoIP and internet telephony applications
  • Voice recording and dictation
  • Voice chat in gaming applications
  • Embedded systems with limited bandwidth
  • Legacy voice communication software
Best For
  • Music streaming and distribution
  • Mobile and portable audio playback
  • Video soundtrack encoding
  • Low-bitrate streaming where quality matters
  • Low-bandwidth voice communication
  • VoIP applications requiring minimal latency
  • Speech recording and archival at very low bitrates
  • Embedded and IoT voice applications
Version History
Introduced: 1997 (MPEG-2 Part 7)
Current Version: MPEG-4 AAC (HE-AAC v2, xHE-AAC)
Status: Industry standard, actively developed
Evolution: AAC-LC (1997) → HE-AAC (2003) → HE-AAC v2 (2006) → xHE-AAC (2012)
Introduced: 2002 (Xiph.Org Foundation)
Final Version: Speex 1.2 (2008)
Status: Obsoleted by Opus (2012), still functional
Evolution: Speex (2002) → Opus (2012, successor)
Software Support
Media Players: VLC, iTunes, WMP, foobar2000
DAWs: Logic Pro, GarageBand, Adobe Audition
Mobile: iOS, Android — native support
Web Browsers: Chrome, Firefox, Safari, Edge
Streaming: Apple Music, YouTube, Spotify
Media Players: VLC, foobar2000, MPlayer
VoIP: Asterisk, FreeSWITCH, Oribter (legacy)
Mobile: Limited — requires third-party apps
Web Browsers: Not natively supported
Libraries: libspeex, FFmpeg, GStreamer

Why Convert AAC to SPX?

Converting AAC to SPX transforms your audio into the Speex speech codec format, which is specifically optimized for encoding human voice at extremely low bitrates (2-44 kbps). While Speex has been officially obsoleted by Opus, it remains useful in legacy VoIP systems, embedded devices with Speex-only decoders, and applications requiring compatibility with older voice communication infrastructure.

Converting from AAC to Speex narrows the audio significantly — Speex is designed exclusively for speech and operates at much lower bitrates and sample rates than AAC. Any music or complex audio in the source will sound poor after conversion. This is practical only when you need to feed voice content into systems that specifically require Speex encoding.

Speex includes built-in features valuable for voice applications: voice activity detection (VAD) automatically detects silence periods, comfort noise generation fills pauses naturally, and acoustic echo cancellation integrates directly with the codec. These features make Speex particularly useful in bidirectional communication systems, even though newer alternatives like Opus provide similar capabilities with better quality.

Keep in mind that Speex operates at a maximum sample rate of 32 kHz (ultra-wideband mode) and bitrates of 2-44 kbps. Any source audio exceeding these specifications will be downsampled and compressed to fit within Speex's constraints. For new projects, consider Opus instead — it is the official successor to Speex with superior quality at all bitrates. Use Speex only when legacy system compatibility is required.

Key Benefits of Converting AAC to SPX:

  • Ultra-Low Bitrate: Speex achieves clear speech at just 2-44 kbps
  • VoIP Optimized: Built-in voice activity detection and comfort noise generation
  • Legacy Compatibility: Works with older VoIP systems and Speex-based platforms
  • Speech Focus: CELP coding specifically optimized for the human voice
  • Patent Free: No licensing concerns with the open-source Speex codec
  • Low Latency: Minimal encoding delay suitable for real-time communication
  • Embedded Systems: Low complexity suitable for resource-constrained devices

Practical Examples

Example 1: VoIP System Integration

Scenario: A call center needs to convert AAC-format voice prompts and IVR recordings to Speex format for their legacy VoIP PBX system that only supports Speex encoding.

Source: ivr_greeting_english.aac (30 sec)
Conversion: AAC → SPX (16 kHz wideband, 24 kbps)
Result: ivr_greeting_english.spx (18 KB)

VoIP Integration:
1. Convert AAC prompts to SPX wideband
2. Upload to Asterisk/FreeSWITCH PBX system
3. Configure IVR menu with SPX audio files
4. Test playback quality on VoIP handsets
5. Deploy across call center phone system

Example 2: Low-Bandwidth Voice Streaming

Scenario: A remote monitoring application needs to transmit voice annotations from field devices over a satellite connection with very limited bandwidth, requiring conversion from AAC to ultra-compact Speex.

Source: field_report_042.aac (3 min)
Conversion: AAC → SPX (8 kHz narrowband, 8 kbps)
Result: field_report_042.spx (18 KB)

Bandwidth Savings:
✓ Extreme compression for voice content
✓ Clear speech at satellite-friendly bitrate
✓ Built-in VAD skips silence periods
✓ Minimal bandwidth usage for voice transmission
✓ Compatible with Speex-based receiving equipment

Example 3: Legacy Gaming Voice Chat

Scenario: A game mod maintainer needs to convert AAC voice recordings to Speex for a legacy multiplayer game engine that uses the Speex codec for in-game voice communication.

Source: voice_taunt_pack.aac (10 clips, ~5 sec each)
Conversion: AAC → SPX (16 kHz wideband, 18 kbps)
Result: voice_taunt_pack.spx (~5 KB per clip)

Game Integration:
✓ Convert to SPX for legacy game engine compatibility
✓ Match existing voice chat codec settings
✓ Maintain consistent audio quality with in-game voice
✓ Small file size for fast network transmission
✓ Compatible with Speex-based voice chat module

Frequently Asked Questions (FAQ)

Q: Why would I convert AAC to SPX (Speex)?

A: The main reason is compatibility with legacy VoIP systems, embedded devices, or older voice chat applications that specifically require Speex encoding. Speex is also useful when you need extreme compression for voice content at bitrates as low as 2 kbps. For new projects, consider Opus instead.

Q: Will converting AAC to SPX lose audio quality?

A: Yes — significantly. Speex is designed for speech at very low bitrates (2-44 kbps) and operates at a maximum sample rate of 32 kHz. Any audio content beyond the speech frequency range will be lost, and overall fidelity will be substantially reduced compared to AAC.

Q: Can Speex handle music or just speech?

A: Speex is designed exclusively for speech. It uses CELP algorithms tuned for the human voice. Music will sound very poor in Speex — metallic, narrow, and heavily distorted. For music, use Opus, OGG Vorbis, or another general-purpose codec.

Q: What is the best Speex mode for voice quality?

A: Ultra-wideband mode (32 kHz) at the highest quality setting provides the best Speex voice quality at about 44 kbps. Wideband (16 kHz) at medium quality is the most common balance. Narrowband (8 kHz) is only for telephone-grade voice.

Q: Should I use Speex or Opus for VoIP?

A: Use Opus — it is the official successor to Speex, provides better quality at all bitrates, handles both speech and music, and is the mandatory codec for WebRTC. Use Speex only when you must support legacy systems that cannot decode Opus.

Q: Does Speex support stereo audio?

A: Yes, Speex supports stereo encoding through its intensity stereo mode. However, stereo Speex is primarily for voice and does not provide the spatial quality of general-purpose codecs. Most Speex usage is mono.

Q: What file extension does Speex use?

A: Speex audio files use the .spx extension and are stored in the Ogg container format. The files can also appear as .ogg with Speex codec identification. Our converter produces standard .spx files in Ogg containers.

Q: How small can a Speex file be?

A: Extremely small. At 8 kbps narrowband, a 1-minute voice recording takes only about 60 KB. At the minimum 2.15 kbps rate, roughly 16 KB per minute. This extreme compression makes Speex valuable for very low-bandwidth applications.