Conexiones de voz

Voice Connections

Voice connections operate in a similar fashion to the Gateway connection. However, they use a different set of payloads and a separate UDP-based connection for RTC data transmission. Because UDP is generally used for both receiving and transmitting RTC data, your client must be able to receive UDP packets, even through a firewall or NAT (see UDP Hole Punching for more information). The Discord voice servers implement functionality (see IP Discovery) for discovering the local machine's remote UDP IP/Port, which can assist in some network configurations. If you cannot support a UDP connection, you may implement a WebRTC connection instead.

Audio and video from a "Go Live" stream require a separate connection to another voice server. Only microphone and camera data are sent over the normal connection.

Voice Gateway

To ensure that you have the most up-to-date information, please use version 8. Otherwise, the events and commands documented here may not reflect what you receive over the socket. Video is only fully supported on Gateway v5 and above.

Gateway Versions

Version

Status

Change

Recommended

Added channel_id to Opcode 0 Identify and Opcode 7 Resume

Recommended

Added buffered resuming

Available

Added Opcode 17 Channel Options Update

Available

Added Opcode 16 Voice Backend Version

Available

Added Opcode 15 Media Sink Wants

Available

Changed speaking status from boolean to bitmask

Deprecated

Added video functionality, consolidated Opcode 1 Hello payload

Deprecated

Changed Gateway heartbeat reply to Opcode 6 Heartbeak ACK

Deprecated

Initial version

Gateway Commands

Name

Description

Identify

Start a new voice connection

Resume

Resume a dropped connection

Heartbeat

Maintain an active WebSocket connection

Media Sink Wants

Indicate the desired media stream quality

Select Protocol

Select the voice protocol and mode

Session Update

Indicate the client's supported codecs

Speaking

Indicate the user's speaking state

Voice Backend Version

Request the current voice backend version

Gateway Events

Name

Description

Hello

Defines the heartbeat interval

Heartbeat ACK

Acknowledges a received client heartbeat

Client Connect

A user connected to voice, also sent on initial connection to inform the client of existing users

Client Flags

Contains the flags of a user that connected to voice, also sent on initial connection for each existing user

Client Platform

Contains the platform type of a user that connected to voice, also sent on initial connection for each existing user

Client Disconnect

A user disconnected from voice

Media Sink Wants

Requested media stream quality updated

Ready

Contains SSRC, IP/Port, experiment, and encryption mode information

Resumed

Acknowledges a successful connection resume

Session Description

Acknowledges a successful protocol selection and contains the information needed to send/receive RTC data

Session Update

Client session description changed

Speaking

User speaking state updated

Voice Backend Version

Current voice backend version information, as requested by the client

Connecting to Voice

Retrieving Voice Server Information

The first step in connecting to a voice server (and in turn, a guild's voice channel or private channel) is formulating a request that can be sent to the Gateway, which will return information about the voice server we will connect to. Because Discord's voice platform is widely distributed, users should never cache or save the results of this call. To inform the Gateway of our intent to establish voice connectivity, we first send an Update Voice State payload.

If our request succeeded, the Gateway will respond with two events—a Voice State Update event and a Voice Server Update event—meaning you must properly wait for both events before continuing. The first will contain a new key, , and the second will provide voice server information we can use to establish a new voice connection.

With this information, we can move on to establishing a voice WebSocket connection.

When changing channels within the same guild, it is possible to receive a Voice Server Update with the same as the existing session. However, the will be changed and you cannot re-use the previous session during a channel change, even if the endpoint remains the same.

Establishing a Voice WebSocket Connection

Once we retrieve a , , and information, we can connect and handshake with the voice server over another secure WebSocket. Unlike the Gateway endpoint we receive in a Get Gateway request, the endpoint received from our Voice Server Update payload does not contain a URL protocol, so some libraries may require manually prepending it with before connecting. Once connected to the voice WebSocket endpoint, we can immediately send an Opcode 0 Identify payload:

Identify Structure

Field

Type

Description

server_id

snowflake

The ID of the guild, private channel, stream, or lobby being connected to

channel_id ¹

snowflake

The ID of the channel being connected to

user_id

snowflake

The ID of the current user

session_id

string

The session ID of the current session

token

string

The voice token for the current session

video?

boolean

Whether this connection supports video (default false)

streams?

array[stream object]

Simulcast streams to send

¹ Only required for Gateway v9 and above.

Stream Structure

Field

Type

Description

type ¹

string

The type of media stream to send

rid

string

The RTP stream ID

quality?

integer

The media quality to send (0-100, default 0)

active?

boolean

Whether the stream is active (default false)

max_bitrate?

integer

The maximum bitrate to send in bps

max_framerate?

integer

The maximum framerate to send in fps

max_resolution?

stream resolution object

The maximum resolution to send

ssrc?

integer

The SSRC of the stream

rtx_ssrc?

integer

The SSRC of the retransmission stream

¹ Currently, this field is ignored and always set to .

Media Type

Value

Description

audio

Audio

video

Video

screen

Screenshare

test

Speed test

Stream Resolution Structure

Field

Type

Description

type

string

The resolution type to use

width

number

The fixed resolution width, or 0 for source

height

number

The fixed resolution height, or 0 for source

Resolution Type

Value

Description

fixed

Fixed resolution

source

Source resolution

Example Identify

{  "op": 0,  "d": {    "server_id": "41771983423143937",    "user_id": "104694319306248192",    "session_id": "30f32c5d54ae86130fc4a215c7474263",    "token": "66d29164ee8cd919",    "video": true,    "streams": [      { "type": "video", "rid": "100", "quality": 100 },      { "type": "video", "rid": "50", "quality": 50 }    ]  }}

The voice server should respond with an Opcode 2 Ready payload, which informs us of the SSRC, connection IP/port, supported encryption modes, and experiments the voice server supports:

Ready Structure

Field

Type

Description

ssrc

integer

The SSRC of the user's voice connection

string

The IP address of the voice server

port

integer

The port of the voice server

modes

array[string]

Supported voice encryption modes

experiments

array[string]

Available voice experiments

streams

array[stream object]

Populated simulcast streams

Example Ready

{  "op": 2,  "d": {    "ssrc": 12871,    "ip": "127.0.0.1",    "port": 1234,    "modes": [      "aead_aes256_gcm_rtpsize",      "aead_aes256_gcm",      "aead_xchacha20_poly1305_rtpsize",      "xsalsa20_poly1305_lite_rtpsize",      "xsalsa20_poly1305_lite",      "xsalsa20_poly1305_suffix",      "xsalsa20_poly1305"    ],    "experiments": ["fixed_keyframe_interval"],    "streams": [      {        "type": "video",        "ssrc": 12872,        "rtx_ssrc": 12873,        "rid": "50",        "quality": 50,        "active": false      },      {        "type": "video",        "ssrc": 12874,        "rtx_ssrc": 12875,        "rid": "100",        "quality": 100,        "active": false      }    ]  }}

Establishing a Voice Connection

Once we receive the properties of a voice server from our Ready payload, we can proceed to the final step of voice connections, which entails establishing and handshaking a connection for RTC data. First, we establish either a UDP connection using the Ready payload data, or prepare a WebRTC SDP. We then send an Opcode 1 Select Protocol with details about our connection:

Select Protocol Structure

Field

Type

Description

protocol

string

The voice protocol to use

data

?protocol data | string

The voice connection data or WebRTC SDP

rtc_connection_id?

string

The UUID RTC connection ID, used for analytics

codecs?

array[codec object]

The supported audio/video codecs

experiments?

array[string]

The received voice experiments to enable

Protocol Type

Value

Description

udp

Standard UDP voice connection

webrtc

WebRTC voice connection

~~webrtc-p2p~~

~~WebRTC peer-to-peer voice connection~~

Protocol Data Structure

Field

Type

Description

address ¹

string

The discovered IP address of the client

port ¹

integer

The discovered UDP port of the client

mode

string

The encryption mode to use

¹ These fields are only used to receive RTC data. If you only wish to send frames and do not care about receiving, you can randomize these values.

Codec Structure

Field

Type

Description

name

string

The name of the codec

type

string

The type of codec

priority ¹

integer

The preferred priority of the codec as a multiple of 1000 (unique per type)

payload_type ²

integer

The dynamic RTP payload type of the codec

rtx_payload_type?

integer

The dynamic RTP payload type of the retransmission codec (video-only)

encode?

boolean

Whether the client supports encoding this codec (default true)

decode?

boolean

Whether the client supports decoding this codec (default true)

¹ For audio, Opus is the only available codec and should be priority .

² No payload type should be set to , as it is reserved for probe packets.

Supported Codecs

Providing codecs is optional due to backwards compatibility with old clients and bots that do not handle video. If the client does not provide any codecs, the server assumes an Opus audio codec with a payload type of and no specific video codec. If no clients with specified video codecs are connected, the server defaults to H264.

Type

Name

Status

audio

opus

Required

video

AV1

Preferred

video

H265

Preferred

video

H264

Default

video

VP8

Available

video

VP9

Available

Example Select Protocol

{  "op": 1,  "d": {    "protocol": "udp",    "data": {      "address": "127.0.0.1",      "port": 1337,      "mode": "aead_aes256_gcm_rtpsize"    },    "codecs": [      {        "name": "opus",        "type": "audio",        "priority": 1000,        "payload_type": 120      },      {        "name": "AV1",        "type": "video",        "priority": 1000,        "payload_type": 101,        "rtx_payload_type": 102,        "encode": false,        "decode": true      },      {        "name": "H264",        "type": "video",        "priority": 2000,        "payload_type": 103,        "rtx_payload_type": 104,        "encode": true,        "decode": true      }    ],    "rtc_connection_id": "d6b92f64-40df-48eb-8bce-7facb043149a",    "experiments": ["fixed_keyframe_interval"]  }}

Encryption Mode

The RTP size variants determine the unencrypted size of the RTP header in the same way as SRTP, which considers CSRCs and (optionally) the extension preamble to be part of the unencrypted header. The deprecated variants use a fixed size unencrypted header for RTP.

The Gateway will report what encryption modes are available in Opcode 2 Ready. Compatible modes will always include but may not include depending on the underlying hardware. You must support . You should prefer to use when it is available.

Value

Name

Nonce

Status

aead_aes256_gcm_rtpsize

AEAD AES256 GCM (RTP Size)

32-bit incremental integer value appended to payload

Preferred

aead_xchacha20_poly1305_rtpsize

AEAD XChaCha20 Poly1305 (RTP Size)

32-bit incremental integer value appended to payload

Required

xsalsa20_poly1305_lite_rtpsize

XSalsa20 Poly1305 Lite (RTP Size)

32-bit incremental integer value appended to payload

Deprecated

aead_aes256_gcm

AEAD AES256-GCM

32-bit incremental integer value appended to payload

Deprecated

xsalsa20_poly1305

XSalsa20 Poly1305

Copy of RTP header

Deprecated

xsalsa20_poly1305_suffix

XSalsa20 Poly1305 (Suffix)

24 random bytes

Deprecated

xsalsa20_poly1305_lite

XSalsa20 Poly1305 (Lite)

32-bit incremental integer value, appended to payload

Deprecated

Finally, the voice server will respond with an Opcode 4 Session Description that includes the and , a 32 byte array used for sending and receiving RTC data:

Session Description Structure

Field

Type

Description

audio_codec

string

The audio codec to use

video_codec

string

The video codec to use

media_session_id

string

The media session ID, used for analytics

mode?

string

The encryption mode to use, not applicable to WebRTC

secret_key?

array[integer]

The 32 byte secret key used for encryption, not applicable to WebRTC

sdp?

string

The WebRTC session description protocol

keyframe_interval?

integer

The keyframe interval in milliseconds

Example Session Description

{  "op": 4,  "d": {    "audio_codec": "opus",    "media_session_id": "89f1d62f166b948746f7646713d39dbb",    "mode": "aead_aes256_gcm_rtpsize",    "secret_key": [ ... ],    "video_codec": "H264"  }}

We can now start sending and receiving RTC data over the previously established UDP or WebRTC connection.

Session Updates

At any time, the client may update the they support using an Opcode 14 Session Update. If a user joins that does not support the current codecs, or a user indicates that they no longer support the current codecs, the voice server will send an Opcode 14 Session Update:

This may also be sent to update the current or .

Session Update Structure (Send)

Field

Type

Description

codecs

array[codec object]

The supported audio/video codecs

Session Update Structure (Receive)

Field

Type

Description

audio_codec?

string

The new audio codec to use

video_codec?

string

The new video codec to use

media_session_id?

string

The new media session ID, used for analytics

keyframe_interval?

integer

The keyframe interval in milliseconds

Heartbeating

In order to maintain your WebSocket connection, you need to continuously send heartbeats at the interval determined in Opcode 8 Hello.

This is sent at the start of the connection. Be warned that the Opcode 8 Hello structure differs by Gateway version. Versions below v3 follow a flat structure without or fields, including only a single field. Be sure to expect this different format based on your version.

This heartbeat interval is the minimum interval you should heartbeat at. You can heartbeat at a faster interval if you wish. For example, the web client uses a heartbeat interval of if the Gateway version is v4 or above, and otherwise. The desktop client uses the provided heartbeat interval if the Gateway version is v4 or above, and otherwise.

Hello Structure

Field

Type

Description

integer

The voice server version

heartbeat_interval

integer

The minimum interval (in milliseconds) the client should heartbeat at

Example Hello

{  "op": 8,  "d": {    "v": 8,    "heartbeat_interval": 41250  }}

The Gateway may request a heartbeat from the client in some situations by sending an Opcode 3 Heartbeat. When this occurs, the client should immediately send an Opcode 3 Heartbeat without waiting the remainder of the current interval.

After receiving Opcode 8 Hello, you should send Opcode 3 Heartbeat—which contains an integer nonce—every elapsed interval:

Heartbeat Structure

Field

Type

Description

integer

A unique integer nonce (e.g. the current unix timestamp)

seq_ack?

integer

The last received sequence number

Example Heartbeat

{  "op": 3,  "d": {    "t": 1501184119561,    "seq_ack": 10  }}

Since Gateway v8, heartbeat messages must include which contains the sequence number of the last numbered message received from the gateway. See Buffered Resume for more information. Previous versions follow a flat structure, with the field representing the field in both the Heartbeat and Heartbeat ACK structure.

In return, you will be sent back an Opcode 6 Heartbeat ACK that contains the previously sent nonce:

Example Heartbeat ACK

{  "op": 6,  "d": {    "t": 1501184119561  }}

UDP Connections

UDP is the most likely protocol that clients will use. First, we open a UDP connection to the IP and port provided in the Ready payload. If required, we can now perform an IP Discovery using this connection. Once we've fully discovered our external IP and UDP port, we can then tell the voice WebSocket what it is by sending a Select Protocol as outlined above, and receive our Session Description to begin sending/receiving RTC data.

IP Discovery

Generally routers on the Internet mask or obfuscate UDP ports through a process called NAT. Most users who implement voice will want to utilize IP discovery to find their external IP and port which will then be used for receiving voice communications. To retrieve your external IP and port, send the following UDP packet to your voice port (all numeric are big endian):

Field

Type

Description

Size

Type

Unsigned short (big endian)

Values 0x1 and 0x2 indicate request and response, respectively

2 bytes

Length

Unsigned short (big endian)

Message length excluding Type and Length fields (value 70)

2 bytes

SSRC

Unsigned integer (big endian)

The SSRC of the user

4 bytes

Address

Null-terminated string

The external IP address of the user

64 bytes

Port

Unsigned short (big endian)

The external port number of the user

2 bytes

Sending and Receiving Voice

Voice data sent to and received from Discord should be encoded or decoded with Opus, using two channels (stereo) and a sample rate of 48kHz. Video data should be encoded or decoded using the RFCs relevant to the codec being used. Data is sent using a RTP Header, followed by encrypted Opus audio data or video data. Encryption uses the key passed in Session Description and the nonce formed with the 12 byte header appended with 12 bytes, if required. Discord encrypts with the libsodium encryption library.

When receiving data, the user who sent the packet is identified by caching the SSRC and user IDs received from Speaking events. At least one Speaking event for the user is received before any frames are received, so the user ID should always be available.

RTP Packet Structure

Field

Type

Description

Size

Version + Flags ¹

Unsigned byte

The RTP version and flags (always 0x80 for voice)

1 byte

Payload Type ²

Unsigned byte

The type of payload (0x78 with the default Opus configuration)

1 byte

Sequence

Unsigned short (big endian)

The sequence number of the packet

2 bytes

Timestamp

Unsigned integer (big endian)

The RTC timestamp of the packet

4 bytes

SSRC

Unsigned integer (big endian)

The SSRC of the user

4 bytes

Payload

Binary data

Encrypted audio/video data

n bytes

¹ If sending an RTP header extension, the flags should have the extension bit () set (e.g. becomes ).

² When sending a final video frame, the payload type should have the M bit () set (e.g. becomes ).

Quality of Service

Discord utilizes RTCP packets to monitor the quality of the connection. Sending and parsing these packets is not required, but is recommended to aid in monitoring the connection and synchronizing audio and video streams. The client should send an RTCP Sender Report roughly every 5 seconds (without padding or reception report blocks) to inform the server of the current state of the connection. Likewise, Discord will send RTCP Receiver Reports to the client to provide feedback on the quality of the connection.

WebRTC Connections

WebRTC allows for direct peer-to-peer voice connections, and is most commonly used in browsers. To use WebRTC, you must first send a Select Protocol payload as outlined above, with the field set to , and set to the client's WebRTC SDP. The voice server will respond with a Session Description payload, with the field set to the server's WebRTC SDP. The client can then use this SDP to establish a WebRTC connection.

Speaking

To notify the voice server that you are speaking or have stopped speaking, send an Opcode 5 Speaking payload:

Speaking Structure

Field

Type

Description

speaking ¹

integer

The speaking flags

ssrc

integer

The SSRC of the speaking user

user_id ²

snowflake

The user ID of the speaking user

delay? ³

integer

The speaking packet delay

¹ For Gateway v3 and below, this field is a boolean.

² Only sent by the voice server.

³ Not sent by the voice server.

Speaking Flags

Value

Name

Description

1 << 0

VOICE

Normal transmission of voice audio

1 << 1

SOUNDSHARE

Transmission of context audio for video, no speaking indicator

1 << 2

PRIORITY

Priority speaker, lowering audio of other speakers

Example Speaking (Send)

{  "op": 5,  "d": {    "speaking": 5,    "delay": 0,    "ssrc": 1  }}

When a different user's speaking state is updated, and for each user with a speaking state at connection start, the voice server will send an Opcode 5 Speaking payload:

Example Speaking (Receive)

{  "op": 5,  "d": {    "speaking": 5,    "ssrc": 2,    "user_id": "852892297661906993"  }}

Video

To notify the voice server that you are sending video, send an Opcode 12 Video payload:

Video Structure

Field

Type

Description

audio_ssrc

integer

The SSRC of the audio stream

video_ssrc

integer

The SSRC of the video stream

rtx_ssrc ¹

integer

The SSRC of the retransmission stream

streams

array[stream object]

Simulcast streams to send

user_id ²

snowflake

The user ID of the video user

¹ Not sent by the voice server.

² Only sent by the voice server.

Example Video (Send)

{  "op": 12,  "d": {    "audio_ssrc": 13959,    "video_ssrc": 13960,    "rtx_ssrc": 13961,    "streams": [      {        "type": "video",        "rid": "100",        "ssrc": 13960,        "active": true,        "quality": 100,        "rtx_ssrc": 13961,        "max_bitrate": 9000000,        "max_framerate": 60,        "max_resolution": {          "type": "source",          "width": 0,          "height": 0        }      }    ]  }}

When a different user's video state is updated, and for each user with a video state at connection start, the voice server will send an Opcode 12 Video payload:

Example Video (Receive)

{  "op": 12,  "d": {    "user_id": "852892297661906993",    "audio_ssrc": 13959,    "video_ssrc": 13960,    "streams": [      {        "ssrc": 13960,        "rtx_ssrc": 13961,        "rid": "100",        "quality": 100,        "max_resolution": {          "width": 0,          "type": "source",          "height": 0        },        "max_framerate": 60,        "active": true      }    ]  }}

Voice Data Interpolation

When there's a break in the sent data, the packet transmission shouldn't simply stop. Instead, send five frames of silence () before stopping to avoid unintended Opus interpolation with subsequent transmissions.

Likewise, when you receive these five frames of silence, you know that the user has stopped speaking.

Resuming Voice Connection

When your client detects that its connection has been severed, it should open a new WebSocket connection. Once the new connection has been opened, your client should send an Opcode 7 Resume payload:

Resume Structure

Field

Type

Description

server_id

snowflake

The ID of the guild or private channel being connected to

channel_id ²

snowflake

The ID of the channel being connected to

session_id

string

The session ID of the current session

token

string

The voice token for the current session

seq_ack? ¹

integer

The last received sequence number

¹ Only available on Gateway v8 and above.

² Only required for Gateway v9 and above.

Example Resume

{  "op": 7,  "d": {    "server_id": "41771983423143937",    "session_id": "30f32c5d54ae86130fc4a215c7474263",    "token": "66d29164ee8cd919"  }}

If successful, the voice server will respond with an Opcode 9 Resumed to signal that your client is now resumed:

Example Resumed

{  "op": 9,  "d": null}

If the resume is unsuccessful—for example, due to an invalid session—the WebSocket connection will close with the appropriate close code. You should then follow the Connecting flow to reconnect.

Buffered Resume

Since version 8, the Gateway can resend buffered messages that have been lost upon resume. To support this, the Gateway includes a sequence number with all messages that may need to be re-sent.

Example Message With Sequence Number

{  "op": 5,  "d": {    "speaking": 0,    "delay": 0,    "ssrc": 110  },  "seq": 10}

A client using Gateway v8 must include the last sequence number they received under the data key as in both the Opcode 3 Heartbeat and Opcode 7 Resume payloads. If no sequence numbered messages have been received, can be omitted or included with a value of -1.

The Gateway uses a fixed bit length sequence number and handles wrapping the sequence number around. Since Gateway messages will always arrive in order, a client only needs to retain the last sequence number they have seen.

If the session is successfully resumed, the Gateway will respond with an Opcode 9 Resumed and will re-send any messages that the client did not receive.

The resume may be unsuccessful if the buffer for the session no longer contains a message that has been missed. In this case the session will be closed and you should then follow the Connecting flow to reconnect.

Connected Clients

Client Connections

At connection start, and when a client thereafter connects to voice, the voice server will send a series of events, including an Opcode 11 Client Connect, and Opcode 18 Client Flags and Opcode 20 Client Platform for every joined user.

These events are meant to inform a new client of all existing clients and their flags/platform, and inform existing clients of a newly-connected client.

Client Connect Structure

Field

Type

Description

user_ids

snowflake

The IDs of the users that connected

Example Client Connect

{  "op": 11,  "d": {    "user_ids": ["852892297661906993"]  }}

Client Flags Structure

Field

Type

Description

user_id

snowflake

The ID of the user that connected

flags

?integer

The user's voice flags

Voice Flags

Value

Name

Description

1 << 0

CLIPS_ENABLED

User has clips enabled

1 << 1

ALLOW_VOICE_RECORDING

User has allowed their voice to be recorded in another user's clips

1 << 2

ALLOW_ANY_VIEWER_CLIPS

User has allowed stream viewers to clip them

Example Client Flags

{  "op": 18,  "d": {    "user_id": "852892297661906993",    "flags": 3  }}

Client Platform Structure

Field

Type

Description

user_id

snowflake

The ID of the user that connected

platform

?integer

The user's voice platform

Voice Platform

Value

Name

Description

DESKTOP

Desktop-based client

MOBILE

Mobile client

XBOX

Xbox integration

PLAYSTATION

PlayStation integration

Example Client Platform

{  "op": 20,  "d": {    "user_id": "852892297661906993",    "platform": 0  }}

Client Disconnections

When a user disconnects from voice, the voice server will send an Opcode 13 Client Disconnect:

When received, the SSRC of the user should be discarded.

Client Disconnect Structure

Field

Type

Description

user_id

snowflake

The ID of the user that disconnected

Example Client Disconnect

{  "op": 13,  "d": {    "user_id": "852892297661906993"  }}

Simulcasting

The voice server supports simulcasting, allowing clients to send multiple video streams of different qualities and adjust the quality of the video stream they receive to fit bandwidth constraints. This can be used to lower the quality of a received video stream when the user is not in focus, or to disable the transmission of a voice or video stream entirely when a user is off-screen or a client has muted them.

A media stream specified by a given SSRC can be requested at a quality level between 0 and 100, with 0 disabling it entirely and 100 being the highest quality. Additionally, if the user offers multiple streams for a given media type, the client can request a specific stream by setting its quality level to 100 and the others to 0. A special SSRC value of can be used to request a quality level for all streams.

Clients may request the media quality they want per SSRC by sending an Opcode 15 Media Sink Wants payload with a mapping of SSRCs to quality levels. Clients may also specify a field to indicate the preferred resolution of the video stream for each SSRC, which can be used by the voice server to determine the best quality level to send based on the client's capabilities and preferences.

Likewise, the voice server may send a Opcode 15 Media Sink Wants payload to inform the client of the quality levels it should be sending for each SSRC.

Example Media Sink Wants

{  "op": 15,  "d": {    "8964": 100,    "pixelCounts": {      "8964": 1189844.5769597634    }  }}

Voice Backend Version

For analytics, the client may want to receive information about the voice backend's current version. To do so, send an Opcode 16 Voice Backend Version with an empty payload:

Voice Backend Version Structure

Field

Type

Description

voice

string

The voice backend's version

rtc_worker

string

The WebRTC worker's version

Example Voice Backend Version (Send)

{  "op": 16,  "d": {}}

In response, the voice server will send an Opcode 16 Voice Backend Version payload with the versions:

Example Voice Backend Version (Receive)

{  "op": 16,  "d": {    "voice": "0.9.1",    "rtc_worker": "0.3.35"  }}

Streams

Stream connections operate in a similar fashion to regular voice connections. In fact, on the protocol side, they are identical and use all of the payloads and processes described above. The main differences are within the Gateway protocol, as streams are started and joined differently to regular voice connections.

Connecting to Streams

To start or join a stream, the client must first be connected to the voice instance that the stream is hosted on. Then, send a Create Stream or Watch Stream payload to the Gateway.

If our request succeeded, as with voice, you must wait for the Gateway to respond with two events—a Stream Create event and a Stream Server Update. You can then use the information provided in these events to establish a connection to the stream server as outlined in Connecting to Voice. Note that the used when identifying will be provided in the Stream Create event.

Note that if joining a stream fails, the Gateway will instead respond with a Stream Delete event which will contain the reason for the failure.