| Internet-Draft | OSTP | March 2026 |
| Hamada | Expires 17 September 2026 | [Page] |
This document specifies the Open Sonic Transport Protocol (OSTP), a UDP/RTP-based protocol for real-time, multi-room audio distribution over both local area networks and the wide-area Internet. OSTP extends the Real-time Transport Protocol (RTP, RFC 3550) with an 8-byte header extension that carries a stream identifier, an extended sequence number, and a high-resolution media timestamp. The protocol defines payload types for uncompressed PCM, 32-bit floating-point PCM, and Opus-coded audio; a relay signalling protocol carried over UDP text datagrams; a WebSocket-based daemon control interface; a binary file-transfer sub-protocol; congestion control using RTCP Receiver Reports and a bitrate ladder; Forward Error Correction (FEC) via XOR parity packets; Negative Acknowledgement (NACK) based retransmission; and an optional DTLS-SRTP security layer. OSTP also defines an economic layer that enables per-listen charging, tipping, and royalty distribution anchored to on-chain wallets.¶
This document is an independent submission describing the OSTP 1.0 wire format as implemented in the OpenSonic/Soluna open-source codebase. It is not the product of an IETF working group.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 17 September 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
Distributing high-quality audio to multiple rooms or devices over a general-purpose IP network is a well-studied problem, yet existing solutions present significant deployment friction. AES67 [AES67] requires IEEE 1588 PTP grand-master infrastructure and IGMP-aware switching, making it unsuitable for consumer environments. WebRTC [RFC8835] provides peer-to-peer audio but is engineered for bi-directional voice calls and carries heavyweight browser-oriented signalling that is inappropriate for high-fidelity one-to-many streaming. RTSP [RFC7826] is a session-control protocol that relies on a separate transport layer and lacks integrated relay topologies.¶
OSTP is designed around three primary goals:¶
OSTP packets are standard RTP packets ([RFC3550]) with the RTP header extension bit set and the extension profile word set to 0x4F53 (ASCII "OS"). Implementations that do not recognise the profile will discard the extension and may attempt to render the payload according to the payload type field, which provides a limited but graceful degradation path.¶
OSTP is explicitly designed to:¶
OSTP does not attempt to:¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
solunad process that runs on the transmitting host.
It exposes a WebSocket control interface on port 8400 and drives
the RTP transmit pipeline.¶
OSTP separates its functions into two planes: a media plane carried over UDP datagrams, and a control plane carried over WebSocket connections.¶
All audio data is transported as RTP packets over UDP. OSTP does not define its own framing; the RTP packet structure defined in [RFC3550] is used verbatim, with the addition of the OSTP extension header (see Section 4).¶
Three network topologies are supported:¶
The three topologies are not mutually exclusive. A hybrid deployment may use LAN multicast for devices on the source subnet while using WAN relay for remote listeners.¶
The control plane is divided into two sub-protocols:¶
solunad
transmitter daemon on TCP port 8400. DCP allows local
applications to start and stop transmission, adjust channel
parameters, and perform file transfers. See Section
9.¶
RTCP packets as defined in [RFC3550] are used for congestion feedback. Receiver Reports (RR) carry loss fraction, cumulative loss, inter-arrival jitter, and delay statistics that the source node uses for bitrate adaptation (Section 7).¶
For large deployments, relay nodes form a distribution tree rooted at the source node. Each relay node forwards incoming audio packets to at most four downstream subscribers (fanout-4). Downstream subscribers may themselves be relay nodes, creating a tree of depth proportional to log4(N) for N total leaf nodes.¶
Source Node
|
[Relay Node 0]
/ | \ \
[R1] [R2] [R3] [R4] <- relay level 1
/ | \ / \ | /|\
L L L L L L L L L <- leaf nodes
¶
Tree construction is driven by the relay; individual relay nodes are unaware of the full tree topology and need only track their direct upstream and up to four direct downstream peers.¶
An OSTP packet consists of four consecutive regions in a single UDP payload:¶
An optional CRC-32 trailer (4 bytes, IEEE 802.3 polynomial) MAY follow the audio payload when the sender sets the RTP padding (P) bit. Receivers SHOULD verify the CRC when present and MUST discard packets that fail CRC verification.¶
All multi-byte fields are in network (big-endian) byte order unless otherwise noted.¶
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Synchronization Source (SSRC) Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Contributing Source (CSRC) list (optional) |
| . . . |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
¶
The four-byte RTP extension header immediately follows the CSRC list and precedes the OSTP extension data:¶
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Profile = 0x4F53 | Length = 0x0002 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
¶
The OSTP extension data is 8 bytes and immediately follows the RTP extension header:¶
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|C C C C| Stream ID (12 bits) | SeqExt (high) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SeqExt (low 8 bits) | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| Media Timestamp (32 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
¶
More precisely, the 16-bit stream_id field carries both
the channel count and stream identifier:¶
Bits 15-12: TX Channel Count code (CCCC) Bits 11-0: Stream Identifier (12 bits)¶
OSTP defines the following dynamic payload types. All are in the dynamic range (96–127) as defined by [RFC3550].¶
| PT | Name | Description | Clock Rate |
|---|---|---|---|
| 96 | PCM24 | Interleaved signed 24-bit integer PCM, big-endian, packed 3 bytes per sample. Sample rate is signalled in the RTP Timestamp clock rate (44100 or 48000 Hz). | 44100 or 48000 Hz |
| 97 | F32 | Interleaved 32-bit IEEE 754 single-precision floating-point PCM, big-endian, normalised to [-1.0, +1.0]. | 44100 or 48000 Hz |
| 98 | OPUS | Opus-encoded audio as defined in [RFC6716]. Each RTP packet carries exactly one Opus frame. | 48000 Hz |
| 126 | NACK | Negative Acknowledgement control packet. Payload is a list of 16-bit RTP sequence numbers for which retransmission is requested. See Section 7.4. | N/A |
| 127 | FEC/XOR | XOR-based Forward Error Correction parity packet covering the preceding group of N audio packets. See Section 7.3. | same as protected stream |
Implementations MUST support PT=98 (Opus) and PT=127 (FEC/XOR). Support for PT=96 (PCM24), PT=97 (F32), and PT=126 (NACK) is RECOMMENDED.¶
When the RTP padding (P) bit is set, the last 4 bytes of the UDP payload are a CRC-32 checksum computed over the audio payload only (i.e., excluding all RTP and OSTP headers, and excluding the CRC field itself). The polynomial is the IEEE 802.3 CRC-32 (0xEDB88320 reflected, initial value 0xFFFFFFFF, final XOR 0xFFFFFFFF). The value is stored in big-endian byte order.¶
Receivers SHOULD verify the CRC. A receiver that encounters a CRC mismatch MUST discard the packet and MAY issue a NACK (Section 7.4) for retransmission.¶
An OSTP channel is identified by a human-readable name — a UTF-8 string of 1 to 64 bytes. Channel names MUST NOT contain ASCII control characters (U+0000–U+001F) or the characters '/' and '#'. Channel names are case-sensitive.¶
The mapping from channel name to transport endpoints is:¶
The stream_id field in the OSTP extension header provides a second level of addressing within a channel, allowing concurrent transmission of multiple bitrate variants or codec alternatives from a single source. Receivers select the stream_id appropriate for their capabilities and network conditions.¶
The Relay Signalling Protocol (RSP) is a line-oriented text protocol carried in UDP datagrams. Each message is a single line terminated by a newline (LF, 0x0A) character. Fields within a message are separated by a single space. All messages MUST be no longer than 1024 bytes including the terminating newline.¶
Unless otherwise noted, RSP messages are exchanged on the same UDP port as OSTP audio datagrams (port 5100 for the relay node). RSP messages and OSTP audio datagrams are distinguished by inspecting the first byte: RSP messages begin with an ASCII uppercase letter (0x41–0x5A); OSTP packets begin with the byte 0x80 (RTP V=2, P=0, X=0, CC=0) or similar values with the two high bits set to 10.¶
JOIN <channel> [<wallet>]
Sent by a leaf node or relay node to the relay server to
subscribe to a channel. <channel> is the
channel name. The optional <wallet> field is
a base58-encoded public key (e.g., a Solana address) used for
micropayment routing.¶
Upon receiving a valid JOIN, the relay MUST respond with a HELLO message and begin forwarding OSTP packets for the requested channel to the sender's source address.¶
HELLO <channel> <relay_id> <server_ts><relay_id> is an opaque identifier for the relay
node (MAY be its public IP address and port in the form
addr:port). <server_ts> is the relay
server's current UNIX timestamp in milliseconds, used by
receivers for initial clock offset estimation.¶
MEMBERS <channel> <count> [<wallet1> ...]<count>
is the current number of active subscribers. The optional
wallet addresses allow the source node to compute royalty
splits for micropayment distribution.¶
LEAVE <channel>WALLET <channel> <wallet>PEER <channel> <addr> <port>CHARGE <channel> <amount_usat> <wallet><amount_usat> is
the payment amount in micro-satoshis (or equivalent base units
for the configured payment rail).¶
TIP <channel> <amount_usat> <from_wallet>PINGPONG message. Implementations
SHOULD send PING messages at intervals no longer than 25 seconds
when no other traffic has been exchanged, in order to maintain
NAT bindings.¶
PONGA relay node MUST maintain the following state per subscribed (channel, client-address) pair:¶
A subscription entry MUST be removed if no RSP or OSTP traffic has been received from the client for more than 60 seconds.¶
OSTP implements congestion control in accordance with the guidelines in [RFC8085]. The control loop combines RTCP Receiver Reports with an application-layer bitrate ladder.¶
Leaf nodes MUST send RTCP Receiver Reports as defined in [RFC3550], Section 6.4. The reporting interval SHOULD be between 1 and 5 seconds, computed using the RTCP timing algorithm. Each Receiver Report block carries:¶
The source node uses the fraction-lost and jitter fields as the primary inputs to the bitrate adaptation algorithm described in Section 7.2.¶
When the active payload type is PT=98 (Opus), the source node selects an encoding bitrate from the following ladder:¶
| Bitrate (kbps) | Use case | Approximate bandwidth per packet (20 ms) |
|---|---|---|
| 32 | Minimum quality / severe loss | ~80 bytes |
| 64 | Low-bandwidth / mobile | ~160 bytes |
| 128 | Near-CD quality | ~320 bytes |
| 192 | High quality | ~480 bytes |
| 320 | Maximum quality / LAN | ~800 bytes |
Bitrate upgrade (step up the ladder) is permitted at most once every 10 seconds, and only when the fraction-lost value in the most recent RTCP RR is below 0.5% and inter-arrival jitter is below 20 ms.¶
Bitrate downgrade (step down the ladder) MUST be triggered immediately when either:¶
When operating over LAN multicast or at PCM payload types (PT=96, PT=97), the source node does not apply bitrate adaptation; it transmits at the full sample rate and relies on FEC (Section 7.3) and NACK (Section 7.4) for loss recovery.¶
OSTP uses a simple XOR-parity FEC scheme to recover from single packet losses within a protection group. The scheme is similar to the one described in RFC 5109 but is not wire-compatible with it.¶
The source node groups consecutive audio packets into blocks of N packets (default N=5). After each block, the source MUST transmit one FEC parity packet (PT=127) whose payload is the byte-wise XOR of all N audio payloads in the block, zero-padded to the length of the longest payload.¶
The FEC packet carries in its RTP Timestamp field the RTP Timestamp of the first packet in the block. The OSTP stream_id and sequence_ext fields mirror those of the first packet in the block. The RTP Sequence Number of the FEC packet is N+1 higher than the first packet in the block, i.e., it immediately follows the block in sequence-number space.¶
Audio packets: [seq=100] [seq=101] [seq=102] [seq=103] [seq=104]
FEC packet: [seq=105, PT=127, payload = XOR(100..104)]
If seq=102 is lost, the receiver can recover it:
payload[102] = XOR(payload[100], payload[101],
payload[103], payload[104], FEC_payload)
¶
A receiver that detects a single loss within a protection block (via sequence number gap) SHOULD wait up to one inter-packet interval (the packet period) for the FEC packet before concealing the loss. If the FEC packet arrives in time, the receiver MUST attempt recovery. Recovery is not possible if two or more packets in the same block are lost; in that case the receiver SHOULD apply packet loss concealment.¶
The protection block size N MAY be negotiated as part of session setup via the Daemon Control Protocol. Senders SHOULD NOT use values of N below 3 or above 10.¶
For relay-connected streams where RTT is low enough to make retransmission practical (RTT < 50 ms), receivers MAY request retransmission of lost packets using NACK packets (PT=126).¶
A NACK packet payload consists of one or more 16-bit unsigned integers in network byte order, each representing the RTP Sequence Number of a lost audio packet. A single NACK packet MUST NOT carry more than 32 sequence numbers.¶
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Lost Seq #1 | Lost Seq #2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Lost Seq #3 | ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
¶
Upon receipt of a NACK, the source node or relay node SHOULD retransmit the requested packets from its transmit buffer. The retransmit buffer at the source SHOULD hold at least 200 ms of audio. Retransmitted packets are sent as normal OSTP audio packets with the original sequence numbers and timestamps; the RTP Marker (M) bit is set to indicate that the packet is a retransmission.¶
Receivers SHOULD limit NACK transmission to at most one NACK per lost sequence number, and SHOULD NOT send NACKs for packets that are more than 500 ms old.¶
OSTP includes a binary file distribution sub-protocol for delivering audio files, playlist metadata, and artwork to receivers. File transfers are carried over the WebSocket connection between the daemon and a connected client (Section 9), using binary WebSocket frames.¶
Each binary frame begins with a 1-byte opcode that identifies the frame type:¶
| Opcode | Name | Description |
|---|---|---|
| 0xFA | FILE_BEGIN (current) | Begins transfer of the currently playing file. The frame body contains a fixed 32-byte header followed by the first chunk of file data. |
| 0xFB | FILE_DATA (current) | A continuation chunk for the currently playing file. The frame body is raw file data. |
| 0xFC | FILE_BEGIN (next) | Pre-fetches the next track while the current track is still playing. Frame structure is identical to 0xFA. |
| 0xFD | FILE_DATA (next) | Continuation chunk for the next track pre-fetch. |
The 32-byte header in a FILE_BEGIN (0xFA or 0xFC) frame has the following layout:¶
Offset Length Field
------ ------ -----------------------------------------
0 1 Opcode (0xFA or 0xFC)
1 4 Total file size in bytes (uint32, big-endian)
5 1 MIME type length (N)
6 N MIME type string (UTF-8, not null-terminated)
6+N 4 CRC-32 of complete file (uint32, big-endian)
10+N (pad) Zero-padded to 32 bytes total header length
¶
After the 32-byte header, the remainder of the frame contains the first chunk of file data. Subsequent FILE_DATA frames contain additional chunks. The transfer is complete when the cumulative byte count of all chunks equals the total file size declared in the FILE_BEGIN header.¶
Receivers implement a dual-buffer scheme to eliminate audible gaps between tracks:¶
If the next-track buffer has not reached the pre-buffer threshold at the track boundary, the receiver MAY introduce a short silence rather than underrunning the audio pipeline.¶
The Daemon Control Protocol is a JSON-over-WebSocket protocol
exposed by the solunad transmitter daemon on TCP port 8400
(plain WebSocket; no TLS required for loopback connections).
Remote connections MUST use WSS (WebSocket over TLS).¶
All DCP messages are JSON objects with a mandatory "cmd"
string field. Responses include a "result" field set to
either "ok" or "error", and an optional
"msg" string field for error descriptions.¶
Command ("cmd") |
Parameters | Description |
|---|---|---|
start
|
channel (string),
codec ("pcm24"|"f32"|"opus"),
bitrate (int, kbps),
sample_rate (int),
channels (int, 1–8)
|
Start transmitting on the specified channel. If already transmitting, the existing session is stopped and restarted with the new parameters. |
stop
|
— | Stop the active transmission session. |
status
|
— | Returns current session state including channel name, active payload type, bitrate, packets sent, bytes sent, and connected relay nodes. |
set_bitrate
|
bitrate (int, kbps) |
Dynamically change the Opus encoding bitrate without interrupting the session. |
set_fec
|
enabled (bool),
group_size (int, 3–10)
|
Enable or disable FEC and set the protection block size. |
relay_add
|
host (string),
port (int)
|
Add a relay node to the active session. |
relay_remove
|
host (string),
port (int)
|
Remove a relay node from the active session. |
file_send
|
path (string),
slot ("current"|"next")
|
Initiate file transfer of the specified local file path.
slot determines whether 0xFA/0xFB or 0xFC/0xFD
opcodes are used.
|
wallet_set
|
address (string) |
Associate a wallet address with the active channel for micropayment collection. Sends a WALLET RSP message to all connected relay nodes. |
subscribe
|
events (array of strings)
|
Subscribe to asynchronous event notifications. Supported
event types: "packet_stats", "member_update",
"payment", "rtcp".
|
After a subscribe command, the daemon emits unsolicited
JSON event objects. Each event object carries an "event"
field instead of a "cmd" field. Examples:¶
Packet statistics event (every 1 second):
{
"event": "packet_stats",
"packets_sent": 2400,
"bytes_sent": 1920000,
"packets_lost_reported": 3,
"bitrate_kbps": 128
}
Member update event:
{
"event": "member_update",
"channel": "mystream",
"count": 7,
"wallets": ["4Zf3...", "9xKL..."]
}
Payment event:
{
"event": "payment",
"type": "tip",
"amount_usat": 1000000,
"from_wallet": "4Zf3..."
}
¶
OSTP SHOULD be protected with DTLS-SRTP [RFC5764] when operating over the public Internet and MUST use DTLS-SRTP when the stream carries paid content (i.e., when a non-zero charge rate is configured via the economic layer).¶
When DTLS-SRTP is enabled, the DTLS handshake is performed on the same UDP socket used for OSTP audio and RSP signalling. Implementations MUST support the cipher suite TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 and SHOULD support TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256. SRTP protection profile AES_128_CM_HMAC_SHA1_80 is REQUIRED.¶
RSP text messages sent before the DTLS handshake completes are transmitted in clear text. Implementations MUST limit pre-DTLS RSP to JOIN and HELLO messages only; WALLET, CHARGE, and TIP messages MUST NOT be sent before DTLS-SRTP is established.¶
Relay nodes MAY require a session token in the JOIN message to restrict channel access. The token is appended as an additional field:¶
JOIN <channel> [<wallet>] [token=<base64url-token>]¶
Session tokens are opaque to the relay protocol and are validated
by the relay using implementation-specific means (e.g., HMAC-SHA256
signed by the channel owner's key). A relay that requires tokens
MUST respond with a new DENIED RSP message and MUST NOT
forward any audio to unauthenticated subscribers.¶
Token issuance and revocation are out of scope for this specification.¶
Because OSTP is UDP-based, relay nodes are potential amplification vectors in reflection attacks. Relay implementations MUST enforce the following mitigations:¶
Wallet addresses carried in RSP messages are public keys and do not constitute sensitive information. However, implementations MUST NOT process CHARGE or TIP messages from untrusted sources. Specifically:¶
OSTP relay nodes learn the IP addresses of all subscribers to a channel. When DTLS-SRTP is not in use, the relay also has access to the full audio content of the stream. Deployments that require listener privacy MUST use DTLS-SRTP and SHOULD use a relay node operated by or on behalf of the channel owner.¶
Wallet addresses in RSP messages are permanently linkable to IP addresses as observed by the relay. Participants concerned about payment privacy SHOULD use stealth addresses or zero-knowledge payment schemes, which are outside the scope of this specification.¶
This document uses the following UDP ports. These ports are not currently registered with IANA; the authors intend to request registration if this protocol advances beyond experimental status.¶
| Port | Usage |
|---|---|
| 5004 | OSTP audio (LAN multicast and unicast). Note: Port 5004 is already registered with IANA for "rtp" (RTP media); OSTP is intended to be compatible with this registration. |
| 5100 | OSTP relay node (RSP and forwarded audio datagrams). |
| 8400 | OSTP Daemon Control Protocol (WebSocket, TCP). |
This document defines the RTP header extension profile value 0x4F53 (the ASCII string "OS") to identify OSTP extension data. RTP extension profile values are allocated from the IANA registry "RTP Payload Format media types" (currently unregistered; this document requests registration of 0x4F53 for the "Open Sonic Transport Protocol extension").¶
The following dynamic payload type values are used by OSTP. Dynamic payload types (96–127) do not require IANA registration per [RFC3550] but are listed here for informational purposes. SDP mapping for these payload types follows the procedures of [RFC4566].¶
| PT | Name | Clock Rate | Channels |
|---|---|---|---|
| 96 | OSTP/PCM24 | 44100 or 48000 | 1–8 |
| 97 | OSTP/F32 | 44100 or 48000 | 1–8 |
| 98 | OSTP/OPUS | 48000 | 1–2 |
| 126 | OSTP/NACK | — | — |
| 127 | OSTP/FEC-XOR | (same as protected stream) | — |
This document uses the IPv4 multicast address 239.69.0.1 in the organisation-local scope range (239.0.0.0/8). This address is not registered with IANA and is intended for local-network use only. Deployments that require a globally routable multicast address should use the procedures described in [RFC6838].¶
OSTP does not mandate PTP or NTP for clock synchronisation. Instead, receivers estimate the source clock offset from the relationship between the RTP Timestamp and the Media Timestamp in the OSTP extension. The HELLO RSP message provides an initial wall-clock anchor from the relay node, which receivers use for coarse clock alignment. Fine-grained per-packet jitter compensation is performed by the receiver's playout buffer.¶
Implementations targeting sample-accurate multi-room synchronisation on a LAN MAY use PTP (IEEE 1588) as a separate out-of-band mechanism; the OSTP timestamps are then interpreted in the PTP time domain.¶
The reference implementation uses an adaptive playout buffer with a target depth of 50 ms for LAN multicast and 150 ms for WAN relay. The buffer depth is adjusted based on measured inter-arrival jitter: when jitter exceeds 20% of the current target depth, the target is doubled (up to a maximum of 500 ms). When jitter has been below 10% of the current target depth for more than 10 seconds, the target is halved (down to a minimum of 20 ms for LAN, 50 ms for WAN).¶
The OSTP packet parser requires only the following operations: byte-order conversion, 32-bit integer arithmetic, and CRC-32 computation. The minimum receive buffer to process a single OSTP packet without dynamic allocation is 1500 bytes (one Ethernet MTU). Implementations on microcontrollers with less than 64 KB of RAM are feasible using PT=98 (Opus) and a shallow playout buffer (20–40 ms).¶
The reference ESP32 implementation receives Opus-coded OSTP packets from the WAN relay and decodes them using the Opus codec library compiled for Xtensa LX6, achieving end-to-end latency of approximately 200 ms over a Wi-Fi link.¶