OSTP Protocol Overview

Packet Format

The OSTP packet consists of a standard RTP header with an OSTP extension on top, followed by the audio payload and a CRC-32 integrity check.

RTP Header

V=2, PT, SSRC, Seq

12 bytes

RTP Extension

Profile 0x4F53

4 bytes

OSTP Extension

stream_id, seq_ext, ts

8 bytes

Audio Payload

Opus / PCM / AAC

max 12,288 bytes

CRC-32

Trailer

4 bytes

RTP Standard

OSTP Extension

Audio Data

Integrity Check

Profile: 0x4F53 "OS"

Identifies OSTP as the profile ID for the RTP extension header. Corresponds to ASCII "OS".

stream_id (u16)

Logical stream identifier. Used for multiplexing multiple channels.

sequence_ext (u16)

Upper extension of the RTP 16-bit sequence number. Prevents wraparound in long-running streams.

media_timestamp (u32)

Media timestamp based on a 48kHz sample clock. Used for multi-device synchronization.

CRC-32

Integrity verification of the entire packet (header + payload). Enables immediate discard of corrupted packets.

Max MTU: 12,316 bytes

12 + 4 + 8 + 12,288 + 4 = 12,316. Maximum performance on jumbo-frame capable LANs. Fragmentation control is applied on WANs.

Connection Modes

Four Connection Modes

The optimal mode is automatically selected based on your network environment. From ultra-low latency on LAN to global distribution, seamless switching between modes.

⚡

LAN Mode

< 5ms

Automatic discovery of devices on the same network via mDNS/Bonjour. Direct packet transmission using multicast UDP with zero routing overhead for the lowest possible latency. Ideal for perfectly synchronized multi-speaker home audio.

🔗

P2P Mode

2–4 devices, <50ms

Direct peer-to-peer connections across the internet using STUN/TURN for NAT traversal. Ideal for small private sessions and remote DJ monitoring. Zero bandwidth cost since no relay server is involved.

🌐

Relay Mode

5+ devices, ~100ms

Distribution via geo-distributed relay servers deployed on Fly.io. Automatically selected for 5+ devices or environments where NAT traversal is difficult. Relay servers only forward packets without decoding or re-encoding.

⚖

Hybrid Mode

Auto-select

Measures network conditions in real-time and automatically selects the optimal mode. Multicast for LAN devices, P2P or Relay for WAN listeners. Mode transitions are seamless with no interruption to listeners.

Security

Multi-layered security from the transport layer to the application layer. Safety is guaranteed at the protocol level.

🔑

HMAC-SHA256

Fund deposit operations (CHARGE command) are signed with HMAC-SHA256. Tampered deposit requests are immediately rejected by the relay, preventing unauthorized balance manipulation.

🔐

DTLS-SRTP

DTLS-SRTP encryption is supported as an optional feature. Enabled only when certificates are configured at relay startup. Prevents eavesdropping on P2P and relay paths; key exchange is handled automatically via DTLS handshake.

🔒

TLS (Financial Operations)

All financial operations including wallet API, payment processing, and royalty distribution are executed over HTTPS encrypted with TLS 1.3. No financial data is ever carried over the UDP transport.

Layered Economy

Economy Layer

Transport and economic logic are clearly separated. The protocol focuses on audio transmission, while business logic is handled at the application layer.

Economy

Wallet API

Royalty distribution, micropayments, tips, subscriptions

HTTPS REST

Control

HTTPS API

Channel management, authentication, copyright detection, metadata delivery

HTTPS + WSS

Transport

OSTP / RTP

Real-time audio packet transmission, P2P swarm, relay

UDP

Revenue Split Model

Micropayments are automatically executed in real-time for each play.

Rights Holders 70%

DJ 20%

PF 10%

Artists & Labels DJ Cashback Platform

Design Principle: Business logic (payments, copyright detection, user management) is not included in the protocol layer. All of these are implemented via HTTPS APIs at the application layer. The OST protocol focuses exclusively on audio packet transmission and is completely independent of the economy layer.

Distributed Detection

Listener-Side Fingerprint Detection

Copyright detection is not performed by the broadcaster or relay. Instead, listener devices generate audio fingerprints and submit them to the relay, where a consensus algorithm confirms track identity. More listeners means higher accuracy — the same philosophy as P2P.

🎧

1. Listener Captures Audio

Every 30 seconds, the listener device (iPhone or browser) computes a 64-bit fingerprint hash from the audio it is already decoding. The audio is already in memory — no extra capture needed.

📤

2. Hash Submitted to Relay

The hash (~200 bytes) is sent to the relay server over the existing connection. Bandwidth impact is negligible — less than 0.1% of the audio stream itself.

🤝

3. Consensus Algorithm

The relay aggregates reports from multiple listeners. When 2 or more listeners report the same hash (Hamming distance ≤ 8 bits), the track is confirmed. Majority vote eliminates fraudulent reports.

💰

4. Match & Distribute

Confirmed fingerprints are matched against the music database. Royalties are automatically distributed: 70% to rights holders, 20% DJ cashback, 10% platform.

Technical Specifications

Parameter	Value
Audio window	5 seconds of audio per hash
Fingerprint size	64-bit hash (8 frequency bands × 32 time windows)
Frequency bands	200, 400, 800, 1600, 3200, 6400, 12800, 25600 Hz
Submission interval	Every 30 seconds
Consensus threshold	2+ listeners reporting same hash
Match tolerance	Hamming distance ≤ 8 bits
Report retention	60 seconds (auto-cleanup)
Compute framework	Accelerate (iOS) / Web Audio API (browser)
Battery impact	Near zero (audio already decoded, adds only FFT)

Why Listener-Side? Traditional fingerprinting systems place the detection burden on the broadcaster or a central server. SOLUNA distributes this work across listeners — the devices are already decoding the audio, so adding an FFT pass is computationally trivial. This means zero load on the broadcaster, and accuracy scales naturally with audience size.

P2P Swarm

With a Fanout-4 tree structure, the network grows stronger as listeners increase. Mobile detection and redundant parent failover ensure zero interruptions.

🎧

┃

🔗

┃ ┃ ┃ ┃

🎧

📱

🎧

📱

🎧

📱

Source

Relay

WiFi Listener

Mobile (Leaf)

☆
Fanout-4 Tree Structure

Each node forwards packets to up to 4 child nodes. The number of listeners scales exponentially with tree depth. With 16 nodes (depth 2), 64 listeners are reached; at depth 3, up to 256 listeners.

📱
Automatic Mobile Detection

Listeners on cellular connections (4G/5G) are automatically placed as leaf nodes. This prevents unstable mobile connections from becoming relays and avoids cascading disconnections for downstream listeners.

🛡
Redundant Parent Failover

Each node maintains a connection to a secondary parent in addition to the primary parent. If a parent node leaves, it automatically switches to the secondary parent within 50ms. Listeners continue without any interruption.

📊
Relay Fallback

If the P2P tree fails to build, or if peer connections are impossible due to Symmetric NAT, the system automatically falls back to relay mode. Connectivity is always guaranteed.

Documentation