Convert DJVU to TOML

Drag and drop files here or click to select.
Max file size 100mb.
Uploading progress:

DJVU vs TOML Format Comparison

Aspect DJVU (Source Format) TOML (Target Format)
Format Overview
DJVU
DjVu Document Format

Compressed document format from AT&T Labs (1996) specialized for scanned documents with text and images. Multi-layer compression achieves outstanding size reduction for digitized pages.

Standard Format Lossy Compression
TOML
Tom's Obvious Minimal Language

Configuration file format created by Tom Preston-Werner (GitHub co-founder) in 2013. Designed to be a minimal, unambiguous configuration format that maps clearly to a hash table. Used as Rust's Cargo.toml and Python's pyproject.toml.

Standard Format Lossless
Technical Specifications
Structure: Multi-layer compressed format
Encoding: Binary with embedded text layer
Format: IFF85-based container
Compression: Wavelet (IW44) + JB2
Extensions: .djvu, .djv
Structure: Tables with key-value pairs
Encoding: UTF-8 (required)
Format: TOML v1.0.0 specification
Compression: None (plain text)
Extensions: .toml
Syntax Examples

DJVU stores compressed page layers:

AT&TFORM  (IFF85 container)
├── DJVU  (single page)
│   ├── BG44  (background)
│   ├── Sjbz  (text mask)
│   └── TXTz  (hidden text)
└── DIRM  (directory)

TOML uses tables and typed values:

[document]
title = "Extracted Document"
source = "document.djvu"
total_pages = 5

[[pages]]
number = 1
content = """
First page text content
spanning multiple lines."""

[[pages]]
number = 2
content = "Second page text"
Content Support
  • Scanned document pages
  • Mixed text and image content
  • Hidden OCR text layer
  • Multi-page documents
  • Annotations
  • Typed values (string, int, float, bool)
  • Date and datetime types
  • Tables (hash maps)
  • Arrays and inline tables
  • Multi-line strings (triple quotes)
  • Comments with # prefix
Advantages
  • Excellent compression for scanned docs
  • Much smaller than PDF for scans
  • Fast page rendering
  • Searchable with OCR text
  • Unambiguous semantics (unlike YAML)
  • Native data types
  • Multi-line literal strings
  • Comment support
  • Formal v1.0 specification
  • Maps directly to hash tables
Disadvantages
  • Limited native software support
  • Not editable as a document
  • Lossy compression for images
  • Less popular than PDF
  • Less widely adopted than YAML/JSON
  • Verbose for deeply nested data
  • Newer format, fewer tools
  • Not supported in some ecosystems
Common Uses
  • Scanned book archives
  • Digital library collections
  • Academic paper distribution
  • Historical document preservation
  • Rust Cargo.toml package manifests
  • Python pyproject.toml
  • Hugo static site configs
  • Deno configuration
  • Application settings
Best For
  • Compact scanned page storage
  • Digitized book distribution
  • Archiving paper documents
  • Bandwidth-limited environments
  • Typed configuration data
  • Rust and Python project files
  • Unambiguous settings files
  • Modern config file needs
Version History
Introduced: 1996 (AT&T Labs)
Developers: Yann LeCun, Leon Bottou
Status: Stable, open specification
Evolution: DjVuLibre open-source tools
Introduced: 2013 (Tom Preston-Werner)
Current Version: TOML v1.0.0 (2021)
Status: Active, stable v1.0 spec
Evolution: Growing adoption in Rust/Python
Software Support
DjView: Native cross-platform viewer
Okular: KDE document viewer
Evince: GNOME document viewer
Other: SumatraPDF, browser plugins
Python: tomllib (3.11+), tomli, toml
Rust: toml crate (native ecosystem)
JavaScript: @iarna/toml, toml npm
Other: Go, Java, C# libraries available

Why Convert DJVU to TOML?

Converting DJVU to TOML produces structured data output with strong typing guarantees that other text formats lack. TOML distinguishes between strings, integers, floats, booleans, and dates at the format level, ensuring that extracted content is properly typed for downstream processing without ambiguity.

TOML is the native configuration format for the Rust ecosystem (Cargo.toml) and Python's modern packaging system (pyproject.toml). Converting scanned documentation to TOML integrates naturally with development workflows that already use this format for project configuration and metadata.

Unlike YAML, TOML has unambiguous parsing rules defined in a formal specification. There are no surprises with implicit type conversion or whitespace interpretation. This makes TOML output more reliable for automated processing pipelines where consistency matters.

TOML's triple-quoted multi-line strings (""") provide clean representation of extracted paragraphs, while its array-of-tables syntax ([[pages]]) naturally represents the page-by-page structure of a converted DJVU document. The result is a clean, typed, and predictable data format.

Key Benefits of Converting DJVU to TOML:

  • Typed Data: Strings, numbers, dates distinguished at format level
  • Unambiguous: Formal spec eliminates YAML-like parsing surprises
  • Rust Integration: Native format for Cargo and Rust ecosystem
  • Python Compatible: Built-in tomllib in Python 3.11+
  • Multi-line Strings: Triple-quoted blocks for paragraph content
  • Comment Support: Annotate extracted content with # comments
  • Modern Standard: Clean v1.0 specification with growing adoption

Practical Examples

Example 1: API Documentation Extraction

Input DJVU file (api_docs.djvu):

Scanned API reference manual with
endpoint descriptions, parameter lists,
and response format documentation

Output TOML file (api_docs.toml):

[document]
title = "API Reference Manual"
source = "api_docs.djvu"
total_pages = 24

[[pages]]
number = 1
content = """
REST API Reference v2.0
Authentication: Bearer Token required
Base URL: https://api.example.com/v2"""

[[pages]]
number = 2
content = """
GET /users - List all users
Parameters: page, limit, sort
Response: JSON array of user objects"""

Example 2: Hardware Datasheet

Input DJVU file (datasheet.djvu):

Scanned electronic component datasheet:
- Pin configurations
- Electrical characteristics
- Operating parameters

Output TOML file (datasheet.toml):

[document]
title = "IC-7805 Voltage Regulator"
source = "datasheet.djvu"
total_pages = 8

[[pages]]
number = 1
content = """
IC-7805 Voltage Regulator
Input: 7-35V DC
Output: 5V DC regulated
Max current: 1.5A"""

[[pages]]
number = 2
content = """
Pin Configuration:
Pin 1: Input
Pin 2: Ground
Pin 3: Output"""

Example 3: Standards Document

Input DJVU file (standard.djvu):

Scanned technical standard document:
- Scope and definitions
- Requirements and testing methods
- Compliance criteria

Output TOML file (standard.toml):

[document]
title = "Safety Standard BS EN 60950"
source = "standard.djvu"
total_pages = 120

[[pages]]
number = 1
content = """
Scope: This standard applies to
information technology equipment
intended for use at altitudes up
to 5000m above sea level."""

[[pages]]
number = 2
content = """
Definitions:
Accessible part: A part that can be
touched with a test probe."""

Frequently Asked Questions (FAQ)

Q: What is TOML format?

A: TOML (Tom's Obvious Minimal Language) is a configuration file format created by GitHub co-founder Tom Preston-Werner. It is designed to be minimal, unambiguous, and easy to read. It is the standard format for Rust's Cargo.toml and Python's pyproject.toml.

Q: How does TOML differ from YAML?

A: TOML has a formal specification with unambiguous parsing rules, while YAML has known gotchas (like "no" being parsed as false). TOML uses explicit syntax ([tables], key = "value") instead of indentation. TOML is simpler but less suited for deeply nested data.

Q: Can I parse TOML with Python?

A: Yes, Python 3.11+ includes the tomllib module in the standard library. For older Python versions, use the tomli or toml packages: import tomllib; data = tomllib.load(open('file.toml', 'rb')).

Q: How are multi-line texts handled?

A: TOML supports multi-line literal strings with triple quotes ("""). Extracted paragraphs from DJVU pages are wrapped in these blocks, preserving line breaks and structure without escape characters.

Q: Is the output valid TOML?

A: Yes, the converter produces TOML that conforms to the v1.0.0 specification. Special characters in extracted text are properly escaped, and the structure validates against any compliant TOML parser.

Q: Can I use this with Rust projects?

A: The output is standard TOML readable by Rust's toml crate. While the structure represents document content (not a Cargo manifest), you can parse it with serde and toml for any Rust data processing application.

Q: What about dates and numbers in the text?

A: Extracted text is stored as string values to preserve the original content exactly. Page numbers are stored as integers. TOML's type system ensures clear distinction between text content and numeric metadata.

Q: Is the conversion free?

A: Yes, completely free with secure processing and automatic file deletion after conversion.