Improve "Voice over IP" module of CS API (#2374)

* Convert m.call.* schemas syntax to YAML

For consistency.

Signed-off-by: Kévin Commaille <zecakeh@tedomum.fr>

* Clarify user ID format in m.call.invite

Signed-off-by: Kévin Commaille <zecakeh@tedomum.fr>

* Add links to definitions

Signed-off-by: Kévin Commaille <zecakeh@tedomum.fr>

* Improve call schemas

To look more consistent with other schemas.

Signed-off-by: Kévin Commaille <zecakeh@tedomum.fr>

* Clarify URI format in GET /voip/turnServer

Signed-off-by: Kévin Commaille <zecakeh@tedomum.fr>

* Add changelog

Signed-off-by: Kévin Commaille <zecakeh@tedomum.fr>

* Fix regex

Signed-off-by: Kévin Commaille <zecakeh@tedomum.fr>

---------

Signed-off-by: Kévin Commaille <zecakeh@tedomum.fr>
This commit is contained in:
Kévin Commaille 2026-05-12 17:36:31 +02:00 committed by GitHub
parent 8bedf3882c
commit ba960f8d32
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 166 additions and 169 deletions

View file

@ -0,0 +1 @@
Clarify formats of string types.

View file

@ -1,20 +1,20 @@
### Voice over IP
This module outlines how two users in a room can set up a Voice over IP
(VoIP) call to each other. Voice and video calls are built upon the
WebRTC 1.0 standard. Call signalling is achieved by sending [message
events](#events) to the room. In this version of the spec, only two-party
communication is supported (e.g. between two peers, or between a peer
This module outlines how two users in a room can set up a Voice over IP (VoIP)
call to each other. Voice and video calls are built upon the [WebRTC 1.0
standard](https://www.w3.org/TR/webrtc/). Call signalling is achieved by sending
[message events](#events) to the room. In this version of the spec, only
two-party communication is supported (e.g. between two peers, or between a peer
and a multi-point conferencing unit). Calls can take place in rooms with
multiple members, but only two devices can take part in the call.
All VoIP events have a `version` field. This is used to determine whether
devices support this new version of the protocol. For example, clients can use
this field to know whether to expect an `m.call.select_answer` event from their
opponent. If clients see events with `version` other than `0` or `"1"`
(including, for example, the numeric value `1`), they should treat these the
same as if they had `version` == `"1"`.
this field to know whether to expect an [`m.call.select_answer`](#mcallselect_answer)
event from their opponent. If clients see events with `version` other than `0`
or `"1"` (including, for example, the numeric value `1`), they should treat
these the same as if they had `version` == `"1"`.
Note that this implies any and all future versions of VoIP events should be
backwards-compatible. If it does become necessary to introduce a non
@ -29,10 +29,10 @@ lowercase alphanumeric characters is recommended. Parties in the call are identi
`(user_id, party_id)`.
The client adds a `party_id` field containing this ID to the top-level of the content of all VoIP events
it sends on the call, including `m.call.invite`. Clients use this to identify remote echo of their own
events: since a user may call themselves, they cannot simply ignore events from their own user. This
field also identifies different answers sent by different clients to an invite, and matches `m.call.candidates`
events to their respective answer/invite.
it sends on the call, including [`m.call.invite`](#mcallinvite). Clients use this to identify remote echo
of their own events: since a user may call themselves, they cannot simply ignore events from their own
user. This field also identifies different answers sent by different clients to an invite, and matches
[`m.call.candidates`](#mcallcandidates) events to their respective answer/invite.
A client implementation may choose to use the device ID used in end-to-end cryptography for this purpose,
or it may choose, for example, to use a different one for each call to avoid leaking information on which
@ -44,15 +44,16 @@ A grammar for `party_id` is defined [below](#grammar-for-voip-ids).
#### Politeness
In line with [WebRTC perfect negotiation](https://w3c.github.io/webrtc-pc/#perfect-negotiation-example)
there are rules to establish which party is polite in the process of renegotiation. The callee is
always the polite party. In a glare situation, the politeness of a party is therefore determined by
whether the inbound or outbound call is used: if a client discards its outbound call in favour of
an inbound call, it becomes the polite party.
always the polite party. In a [glare](#glare) situation, the politeness of a party is therefore
determined by whether the inbound or outbound call is used: if a client discards its outbound call
in favour of an inbound call, it becomes the polite party.
#### Call Event Liveness
`m.call.invite` contains a `lifetime` field that indicates how long the offer is valid for. When
a client receives an invite, it should use the event's `age` field in the sync response plus the
time since it received the event from the homeserver to determine whether the invite is still valid.
The use of the `age` field ensures that incorrect clocks on client devices don't break calls.
[`m.call.invite`](#mcallinvite) contains a `lifetime` field that indicates how long the offer is
valid for. When a client receives an invite, it should use the event's `age` field in the
[`GET /sync`](#get_matrixclientv3sync) response plus the time since it received the event from the
homeserver to determine whether the invite is still valid. The use of the `age` field ensures that
incorrect clocks on client devices don't break calls.
If the invite is still valid *and will remain valid for long enough for the user to accept the call*,
it should signal an incoming call. The amount of time allowed for the user to accept the call may
@ -83,7 +84,7 @@ Clients should aim to send a small number of candidate events, with guidelines:
#### End-of-candidates
An ICE candidate whose value is the empty string means that no more ICE candidates will
be sent. Clients must send such a candidate in an `m.call.candidates` message.
be sent. Clients must send such a candidate in an [`m.call.candidates`](#mcallcandidates) message.
The WebRTC spec requires browsers to generate such a candidate, however note that at time of writing,
not all browsers do (Chrome does not, but does generate an `icegatheringstatechange` event). The
client should send any remaining candidates once candidate generation finishes, ignoring timeouts above.
@ -130,36 +131,48 @@ or not there have been any changes to the Matrix spec.
A call is set up with message events exchanged as follows:
```nohighlight
Caller Callee
[Place Call]
m.call.invite ----------->
m.call.candidate -------->
[..candidates..] -------->
[Answers call]
<--------------- m.call.answer
m.call.select_answer ----------->
[Call is active and ongoing]
<--------------- m.call.hangup
+---------+ +---------+
| Caller | | Callee |
+---------+ +---------+
| |
(Places Call) |
|------- m.call.invite ------->|
|----- m.call.candidate ------>|
|----- [..candidates..] ------>|
| |
| (Answers call)
|<------ m.call.answer --------|
|--- m.call.select_answer --->|
. .
. (Call is active and ongoing) .
. .
| (Ends call)
|<------ m.call.hangup --------|
```
Or a rejected call:
```nohighlight
Caller Callee
m.call.invite ------------>
m.call.candidate --------->
[..candidates..] --------->
[Rejects call]
<-------------- m.call.hangup
+---------+ +---------+
| Caller | | Callee |
+---------+ +---------+
| |
(Places Call) |
|------- m.call.invite ------->|
|----- m.call.candidate ------>|
|----- [..candidates..] ------>|
| |
| (Rejects call)
|<------ m.call.reject --------|
```
Calls are negotiated according to the WebRTC specification.
In response to an incoming invite, a client may do one of several things:
* Attempt to accept the call by sending an `m.call.answer`.
* Actively reject the call everywhere: send an `m.call.reject` as per above, which will stop the call from
ringing on all the user's devices and the caller's client will inform them that the user has
rejected their call.
* Attempt to accept the call by sending an [`m.call.answer`](#mcallanswer).
* Actively reject the call everywhere: send an [`m.call.reject`](#mcallreject) as per above, which
will stop the call from ringing on all the user's devices and the caller's client will inform
them that the user has rejected their call.
* Ignore the call: send no events, but stop alerting the user about the call. The user's other
devices will continue to ring, and the caller's device will continue to indicate that the call
is ringing, and will time the call out in the normal way if no other device responds.
@ -224,8 +237,8 @@ As calls are "placed" to rooms rather than users, the glare resolution
algorithm outlined below is only considered for calls which are to the
same room. The algorithm is as follows:
- If an `m.call.invite` to a room is received whilst the client is
**preparing to send** an `m.call.invite` to the same room:
- If an [`m.call.invite`](#mcallinvite) to a room is received whilst the
client is **preparing to send** an `m.call.invite` to the same room:
- the client should cancel its outgoing call and instead
automatically accept the incoming call on behalf of the user.
- If an `m.call.invite` to a room is received **after the client has

View file

@ -44,6 +44,8 @@ paths:
type: array
items:
type: string
format: uri
pattern: "^turns?:"
description: A list of TURN URIs
ttl:
type: integer

View file

@ -1,44 +1,37 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"description": "This event is sent by the callee when they wish to answer the call.",
"x-weight": 40,
"allOf": [{
"$ref": "core-event-schema/room_event.yaml"
}],
"properties": {
"content": {
"type": "object",
"allOf": [{
"$ref": "core-event-schema/call_event.yaml"
}],
"properties": {
"answer": {
"type": "object",
"title": "Answer",
"description": "The session description object",
"properties": {
"type": {
"type": "string",
"enum": ["answer"],
"description": "The type of session description."
},
"sdp": {
"type": "string",
"description": "The SDP text of the session description."
}
},
"required": ["type", "sdp"]
},
"sdp_stream_metadata": {
"$ref": "components/sdp_stream_metadata.yaml"
}
},
"required": ["answer"]
},
"type": {
"type": "string",
"enum": ["m.call.answer"]
}
}
}
$schema: https://json-schema.org/draft/2020-12/schema
type: object
description: This event is sent by the callee when they wish to answer the call.
x-weight: 40
allOf:
- $ref: core-event-schema/room_event.yaml
properties:
content:
type: object
allOf:
- $ref: core-event-schema/call_event.yaml
properties:
answer:
type: object
title: Answer
description: The session description object
properties:
type:
type: string
enum:
- answer
description: The type of session description.
sdp:
type: string
description: The SDP text of the session description.
required:
- type
- sdp
sdp_stream_metadata:
$ref: components/sdp_stream_metadata.yaml
required:
- answer
type:
type: string
enum:
- m.call.answer

View file

@ -1,53 +1,47 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"description": "This event is sent by the caller when they wish to establish a call.",
"x-weight": 10,
"allOf": [{
"$ref": "core-event-schema/room_event.yaml"
}],
"properties": {
"content": {
"type": "object",
"allOf": [{
"$ref": "core-event-schema/call_event.yaml"
}],
"properties": {
"offer": {
"type": "object",
"title": "Offer",
"description": "The session description object",
"properties": {
"type": {
"type": "string",
"enum": ["offer"],
"description": "The type of session description."
},
"sdp": {
"type": "string",
"description": "The SDP text of the session description."
}
},
"required": ["type", "sdp"]
},
"lifetime": {
"type": "integer",
"description": "The time in milliseconds that the invite is valid for. Once the invite age exceeds this value, clients should discard it. They should also no longer show the call as awaiting an answer in the UI."
},
"invitee": {
"type": "string",
"description": "The ID of the user being called. If omitted, any user in the room can answer.",
"x-addedInMatrixVersion": "1.7"
},
"sdp_stream_metadata": {
"$ref": "components/sdp_stream_metadata.yaml"
}
},
"required": ["offer", "lifetime"]
},
"type": {
"type": "string",
"enum": ["m.call.invite"]
}
}
}
$schema: https://json-schema.org/draft/2020-12/schema
type: object
description: This event is sent by the caller when they wish to establish a call.
x-weight: 10
allOf:
- $ref: core-event-schema/room_event.yaml
properties:
content:
type: object
allOf:
- $ref: core-event-schema/call_event.yaml
properties:
offer:
type: object
title: Offer
description: The session description object
properties:
type:
type: string
enum:
- offer
description: The type of session description.
sdp:
type: string
description: The SDP text of the session description.
required:
- type
- sdp
lifetime:
type: integer
description: The time in milliseconds that the invite is valid for. Once the invite age exceeds this value, clients should discard it. They should also no longer show the call as awaiting an answer in the UI.
invitee:
type: string
description: The ID of the user being called. If omitted, any user in the room can answer.
x-addedInMatrixVersion: '1.7'
format: mx-user-id
pattern: "^@"
sdp_stream_metadata:
$ref: components/sdp_stream_metadata.yaml
required:
- offer
- lifetime
type:
type: string
enum:
- m.call.invite

View file

@ -1,29 +1,23 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"description": "This event is sent by the caller's client once it has decided which other client to talk to, by selecting one of multiple possible incoming `m.call.answer` events. Its `selected_party_id` field indicates the answer it's chosen. The `call_id` and `party_id` of the caller is also included. If the callee's client sees a `select_answer` for an answer with party ID other than the one it sent, it ends the call and informs the user the call was answered elsewhere. It does not send any events. Media can start flowing before this event is seen or even sent. Clients that implement previous versions of this specification will ignore this event and behave as they did before.",
"x-addedInMatrixVersion": "1.7",
"x-weight": 50,
"allOf": [{
"$ref": "core-event-schema/room_event.yaml"
}],
"properties": {
"content": {
"type": "object",
"allOf": [{
"$ref": "core-event-schema/call_event.yaml"
}],
"properties": {
"selected_party_id": {
"type": "string",
"description": "The `party_id` field from the answer event that the caller chose."
},
},
"required": ["selected_party_id"]
},
"type": {
"type": "string",
"enum": ["m.call.select_answer"]
}
}
}
$schema: https://json-schema.org/draft/2020-12/schema
type: object
description: This event is sent by the caller's client once it has decided which other client to talk to, by selecting one of multiple possible incoming `m.call.answer` events. Its `selected_party_id` field indicates the answer it's chosen. The `call_id` and `party_id` of the caller is also included. If the callee's client sees a `select_answer` for an answer with party ID other than the one it sent, it ends the call and informs the user the call was answered elsewhere. It does not send any events. Media can start flowing before this event is seen or even sent. Clients that implement previous versions of this specification will ignore this event and behave as they did before.
x-addedInMatrixVersion: '1.7'
x-weight: 50
allOf:
- $ref: core-event-schema/room_event.yaml
properties:
content:
type: object
allOf:
- $ref: core-event-schema/call_event.yaml
properties:
selected_party_id:
type: string
description: The `party_id` field from the answer event that the caller chose.
required:
- selected_party_id
type:
type: string
enum:
- m.call.select_answer