RFC 3550 Compatible

OSTP
Open Sonic Transport Protocol

A real-time audio transport protocol built on top of RTP.
From ultra-low latency under 5ms on LAN to global P2P swarm distribution,
a single protocol for audio transmission at any scale.

Packet Format

Packet Format

The OSTP packet consists of a standard RTP header with an OSTP extension on top, followed by the audio payload and a CRC-32 integrity check.

RTP Header
V=2, PT, SSRC, Seq
12 bytes
RTP Extension
Profile 0x4F53
4 bytes
OSTP Extension
stream_id, seq_ext, ts
8 bytes
Audio Payload
Opus / PCM / AAC
max 12,288 bytes
CRC-32
Trailer
4 bytes
RTP Standard
OSTP Extension
Audio Data
Integrity Check
Profile: 0x4F53 "OS"
Identifies OSTP as the profile ID for the RTP extension header. Corresponds to ASCII "OS".
stream_id (u16)
Logical stream identifier. Used for multiplexing multiple channels.
sequence_ext (u16)
Upper extension of the RTP 16-bit sequence number. Prevents wraparound in long-running streams.
media_timestamp (u32)
Media timestamp based on a 48kHz sample clock. Used for multi-device synchronization.
CRC-32
Integrity verification of the entire packet (header + payload). Enables immediate discard of corrupted packets.
Max MTU: 12,316 bytes
12 + 4 + 8 + 12,288 + 4 = 12,316. Maximum performance on jumbo-frame capable LANs. Fragmentation control is applied on WANs.
Connection Modes

Four Connection Modes

The optimal mode is automatically selected based on your network environment. From ultra-low latency on LAN to global distribution, seamless switching between modes.

LAN Mode

< 5ms

Automatic discovery of devices on the same network via mDNS/Bonjour. Direct packet transmission using multicast UDP with zero routing overhead for the lowest possible latency. Ideal for perfectly synchronized multi-speaker home audio.

🔗

P2P Mode

2–4 devices, <50ms

Direct peer-to-peer connections across the internet using STUN/TURN for NAT traversal. Ideal for small private sessions and remote DJ monitoring. Zero bandwidth cost since no relay server is involved.

🌐

Relay Mode

5+ devices, ~100ms

Distribution via geo-distributed relay servers deployed on Fly.io. Automatically selected for 5+ devices or environments where NAT traversal is difficult. Relay servers only forward packets without decoding or re-encoding.

Hybrid Mode

Auto-select

Measures network conditions in real-time and automatically selects the optimal mode. Multicast for LAN devices, P2P or Relay for WAN listeners. Mode transitions are seamless with no interruption to listeners.

Security

Security

Multi-layered security from the transport layer to the application layer. Safety is guaranteed at the protocol level.

🔑

HMAC-SHA256

Fund deposit operations (CHARGE command) are signed with HMAC-SHA256. Tampered deposit requests are immediately rejected by the relay, preventing unauthorized balance manipulation.

🔐

DTLS-SRTP

DTLS-SRTP encryption is supported as an optional feature. Enabled only when certificates are configured at relay startup. Prevents eavesdropping on P2P and relay paths; key exchange is handled automatically via DTLS handshake.

🔒

TLS (Financial Operations)

All financial operations including wallet API, payment processing, and royalty distribution are executed over HTTPS encrypted with TLS 1.3. No financial data is ever carried over the UDP transport.

Layered Economy

Economy Layer

Transport and economic logic are clearly separated. The protocol focuses on audio transmission, while business logic is handled at the application layer.

Economy
Wallet API
Royalty distribution, micropayments, tips, subscriptions
HTTPS REST
Control
HTTPS API
Channel management, authentication, copyright detection, metadata delivery
HTTPS + WSS
Transport
OSTP / RTP
Real-time audio packet transmission, P2P swarm, relay
UDP

Revenue Split Model

Micropayments are automatically executed in real-time for each play.

Rights Holders 70%
DJ 20%
PF 10%
Artists & Labels DJ Cashback Platform
Design Principle: Business logic (payments, copyright detection, user management) is not included in the protocol layer. All of these are implemented via HTTPS APIs at the application layer. The OST protocol focuses exclusively on audio packet transmission and is completely independent of the economy layer.
Distributed Detection

Listener-Side Fingerprint Detection

Copyright detection is not performed by the broadcaster or relay. Instead, listener devices generate audio fingerprints and submit them to the relay, where a consensus algorithm confirms track identity. More listeners means higher accuracy — the same philosophy as P2P.

🎧

1. Listener Captures Audio

Every 30 seconds, the listener device (iPhone or browser) computes a 64-bit fingerprint hash from the audio it is already decoding. The audio is already in memory — no extra capture needed.

📤

2. Hash Submitted to Relay

The hash (~200 bytes) is sent to the relay server over the existing connection. Bandwidth impact is negligible — less than 0.1% of the audio stream itself.

🤝

3. Consensus Algorithm

The relay aggregates reports from multiple listeners. When 2 or more listeners report the same hash (Hamming distance ≤ 8 bits), the track is confirmed. Majority vote eliminates fraudulent reports.

💰

4. Match & Distribute

Confirmed fingerprints are matched against the music database. Royalties are automatically distributed: 70% to rights holders, 20% DJ cashback, 10% platform.

Technical Specifications

ParameterValue
Audio window5 seconds of audio per hash
Fingerprint size64-bit hash (8 frequency bands × 32 time windows)
Frequency bands200, 400, 800, 1600, 3200, 6400, 12800, 25600 Hz
Submission intervalEvery 30 seconds
Consensus threshold2+ listeners reporting same hash
Match toleranceHamming distance ≤ 8 bits
Report retention60 seconds (auto-cleanup)
Compute frameworkAccelerate (iOS) / Web Audio API (browser)
Battery impactNear zero (audio already decoded, adds only FFT)
Why Listener-Side? Traditional fingerprinting systems place the detection burden on the broadcaster or a central server. SOLUNA distributes this work across listeners — the devices are already decoding the audio, so adding an FFT pass is computationally trivial. This means zero load on the broadcaster, and accuracy scales naturally with audience size.
P2P Swarm

P2P Swarm

With a Fanout-4 tree structure, the network grows stronger as listeners increase. Mobile detection and redundant parent failover ensure zero interruptions.

P2P Swarm Distribution
🎧
🔗
🔗
🔗
🔗
┃ ┃ ┃ ┃
🎧
🎧
📱
🎧
📱
🎧
🎧
📱
Source
Relay
WiFi Listener
Mobile (Leaf)

Fanout-4 Tree Structure

Each node forwards packets to up to 4 child nodes. The number of listeners scales exponentially with tree depth. With 16 nodes (depth 2), 64 listeners are reached; at depth 3, up to 256 listeners.

📱
Automatic Mobile Detection

Listeners on cellular connections (4G/5G) are automatically placed as leaf nodes. This prevents unstable mobile connections from becoming relays and avoids cascading disconnections for downstream listeners.

🛡
Redundant Parent Failover

Each node maintains a connection to a secondary parent in addition to the primary parent. If a parent node leaves, it automatically switches to the secondary parent within 50ms. Listeners continue without any interruption.

📊
Relay Fallback

If the P2P tree fails to build, or if peer connections are impossible due to Symmetric NAT, the system automatically falls back to relay mode. Connectivity is always guaranteed.

Documentation

Read the Full Spec

📖

Read the Full Specification

Complete English specification of the OST protocol. Covers packet format, state machines, and error handling.

protocol.md
🇯🇵

Japanese Spec

Complete Japanese specification of the Open Sonic Transport Protocol. For Japanese-speaking contributors and integrators.

protocol-ja.md

GitHub

Source code, Issues, and Pull Requests. Fully open source under the MIT License.

yukihamada/opensonic