A real-time audio transport protocol built on top of RTP.
From ultra-low latency under 5ms on LAN to global P2P swarm distribution,
a single protocol for audio transmission at any scale.
The OSTP packet consists of a standard RTP header with an OSTP extension on top, followed by the audio payload and a CRC-32 integrity check.
The optimal mode is automatically selected based on your network environment. From ultra-low latency on LAN to global distribution, seamless switching between modes.
Automatic discovery of devices on the same network via mDNS/Bonjour. Direct packet transmission using multicast UDP with zero routing overhead for the lowest possible latency. Ideal for perfectly synchronized multi-speaker home audio.
Direct peer-to-peer connections across the internet using STUN/TURN for NAT traversal. Ideal for small private sessions and remote DJ monitoring. Zero bandwidth cost since no relay server is involved.
Distribution via geo-distributed relay servers deployed on Fly.io. Automatically selected for 5+ devices or environments where NAT traversal is difficult. Relay servers only forward packets without decoding or re-encoding.
Measures network conditions in real-time and automatically selects the optimal mode. Multicast for LAN devices, P2P or Relay for WAN listeners. Mode transitions are seamless with no interruption to listeners.
Multi-layered security from the transport layer to the application layer. Safety is guaranteed at the protocol level.
Fund deposit operations (CHARGE command) are signed with HMAC-SHA256. Tampered deposit requests are immediately rejected by the relay, preventing unauthorized balance manipulation.
DTLS-SRTP encryption is supported as an optional feature. Enabled only when certificates are configured at relay startup. Prevents eavesdropping on P2P and relay paths; key exchange is handled automatically via DTLS handshake.
All financial operations including wallet API, payment processing, and royalty distribution are executed over HTTPS encrypted with TLS 1.3. No financial data is ever carried over the UDP transport.
Transport and economic logic are clearly separated. The protocol focuses on audio transmission, while business logic is handled at the application layer.
Micropayments are automatically executed in real-time for each play.
Copyright detection is not performed by the broadcaster or relay. Instead, listener devices generate audio fingerprints and submit them to the relay, where a consensus algorithm confirms track identity. More listeners means higher accuracy — the same philosophy as P2P.
Every 30 seconds, the listener device (iPhone or browser) computes a 64-bit fingerprint hash from the audio it is already decoding. The audio is already in memory — no extra capture needed.
The hash (~200 bytes) is sent to the relay server over the existing connection. Bandwidth impact is negligible — less than 0.1% of the audio stream itself.
The relay aggregates reports from multiple listeners. When 2 or more listeners report the same hash (Hamming distance ≤ 8 bits), the track is confirmed. Majority vote eliminates fraudulent reports.
Confirmed fingerprints are matched against the music database. Royalties are automatically distributed: 70% to rights holders, 20% DJ cashback, 10% platform.
| Parameter | Value |
|---|---|
| Audio window | 5 seconds of audio per hash |
| Fingerprint size | 64-bit hash (8 frequency bands × 32 time windows) |
| Frequency bands | 200, 400, 800, 1600, 3200, 6400, 12800, 25600 Hz |
| Submission interval | Every 30 seconds |
| Consensus threshold | 2+ listeners reporting same hash |
| Match tolerance | Hamming distance ≤ 8 bits |
| Report retention | 60 seconds (auto-cleanup) |
| Compute framework | Accelerate (iOS) / Web Audio API (browser) |
| Battery impact | Near zero (audio already decoded, adds only FFT) |
With a Fanout-4 tree structure, the network grows stronger as listeners increase. Mobile detection and redundant parent failover ensure zero interruptions.
Each node forwards packets to up to 4 child nodes. The number of listeners scales exponentially with tree depth. With 16 nodes (depth 2), 64 listeners are reached; at depth 3, up to 256 listeners.
Listeners on cellular connections (4G/5G) are automatically placed as leaf nodes. This prevents unstable mobile connections from becoming relays and avoids cascading disconnections for downstream listeners.
Each node maintains a connection to a secondary parent in addition to the primary parent. If a parent node leaves, it automatically switches to the secondary parent within 50ms. Listeners continue without any interruption.
If the P2P tree fails to build, or if peer connections are impossible due to Symmetric NAT, the system automatically falls back to relay mode. Connectivity is always guaranteed.
Complete English specification of the OST protocol. Covers packet format, state machines, and error handling.
Complete Japanese specification of the Open Sonic Transport Protocol. For Japanese-speaking contributors and integrators.
Source code, Issues, and Pull Requests. Fully open source under the MIT License.