mirror of
https://github.com/matrix-org/matrix-spec
synced 2026-03-11 05:54:10 +01:00
Merge pull request #1032 from matrix-org/rav/mxid_grammar
Indentifier grammar updates
This commit is contained in:
commit
6282a53ca9
|
|
@ -45,6 +45,16 @@ paths:
|
||||||
If the client does not supply a ``device_id``, the server must
|
If the client does not supply a ``device_id``, the server must
|
||||||
auto-generate one.
|
auto-generate one.
|
||||||
|
|
||||||
|
The server SHOULD register an account with a User ID based on the
|
||||||
|
``username`` provided, if any. Note that the grammar of Matrix User ID
|
||||||
|
localparts is restricted, so the server MUST either map the provided
|
||||||
|
``username`` onto a ``user_id`` in a logical manner, or reject
|
||||||
|
``username``\s which do not comply to the grammar, with
|
||||||
|
``M_INVALID_USERNAME``.
|
||||||
|
|
||||||
|
Matrix clients MUST NOT assume that localpart of the registered
|
||||||
|
``user_id`` matches the provided ``username``.
|
||||||
|
|
||||||
The returned access token must be associated with the ``device_id``
|
The returned access token must be associated with the ``device_id``
|
||||||
supplied by the client or generated by the server. The server may
|
supplied by the client or generated by the server. The server may
|
||||||
invalidate any access token previously associated with that device. See
|
invalidate any access token previously associated with that device. See
|
||||||
|
|
@ -86,7 +96,7 @@ paths:
|
||||||
username:
|
username:
|
||||||
type: string
|
type: string
|
||||||
description: |-
|
description: |-
|
||||||
The local part of the desired Matrix ID. If omitted,
|
The basis for the localpart of the desired Matrix ID. If omitted,
|
||||||
the homeserver MUST generate a Matrix ID local part.
|
the homeserver MUST generate a Matrix ID local part.
|
||||||
example: cheeky_monkey
|
example: cheeky_monkey
|
||||||
password:
|
password:
|
||||||
|
|
@ -121,7 +131,11 @@ paths:
|
||||||
properties:
|
properties:
|
||||||
user_id:
|
user_id:
|
||||||
type: string
|
type: string
|
||||||
description: The fully-qualified Matrix ID that has been registered.
|
description: |-
|
||||||
|
The fully-qualified Matrix user ID (MXID) that has been registered.
|
||||||
|
|
||||||
|
Any user ID returned by this API must conform to the grammar given in the
|
||||||
|
`Matrix specification <https://matrix.org/docs/spec/appendices.html#user-identifiers>`_.
|
||||||
access_token:
|
access_token:
|
||||||
type: string
|
type: string
|
||||||
description: |-
|
description: |-
|
||||||
|
|
|
||||||
|
|
@ -92,6 +92,9 @@
|
||||||
- Add some clarifying notes on the behaviour of rooms with no
|
- Add some clarifying notes on the behaviour of rooms with no
|
||||||
``m.room.power_levels`` event
|
``m.room.power_levels`` event
|
||||||
(`#1026 <https://github.com/matrix-org/matrix-doc/pull/1026>`_).
|
(`#1026 <https://github.com/matrix-org/matrix-doc/pull/1026>`_).
|
||||||
|
- Clarify the relationship between ``username`` and ``user_id`` in the
|
||||||
|
``/register`` API
|
||||||
|
(`#1032 <https://github.com/matrix-org/matrix-doc/pull/1032>`_).
|
||||||
|
|
||||||
r0.2.0
|
r0.2.0
|
||||||
======
|
======
|
||||||
|
|
|
||||||
225
specification/appendices/identifier_grammar.rst
Normal file
225
specification/appendices/identifier_grammar.rst
Normal file
|
|
@ -0,0 +1,225 @@
|
||||||
|
.. Copyright 2016 Openmarket Ltd.
|
||||||
|
.. Copyright 2017 New Vector Ltd.
|
||||||
|
..
|
||||||
|
.. Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
.. you may not use this file except in compliance with the License.
|
||||||
|
.. You may obtain a copy of the License at
|
||||||
|
..
|
||||||
|
.. http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
..
|
||||||
|
.. Unless required by applicable law or agreed to in writing, software
|
||||||
|
.. distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
.. See the License for the specific language governing permissions and
|
||||||
|
.. limitations under the License.
|
||||||
|
|
||||||
|
Identifier Grammar
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Server Name
|
||||||
|
~~~~~~~~~~~
|
||||||
|
|
||||||
|
A homeserver is uniquely identified by its server name. This value is used in a
|
||||||
|
number of identifiers, as described below.
|
||||||
|
|
||||||
|
The server name represents the address at which the homeserver in question can
|
||||||
|
be reached by other homeservers. The complete grammar is::
|
||||||
|
|
||||||
|
server_name = dns_name [ ":" port]
|
||||||
|
dns_name = host
|
||||||
|
port = *DIGIT
|
||||||
|
|
||||||
|
where ``host`` is as defined by `RFC3986, section 3.2.2
|
||||||
|
<https://tools.ietf.org/html/rfc3986#section-3.2.2>`_.
|
||||||
|
|
||||||
|
Examples of valid server names are:
|
||||||
|
|
||||||
|
* ``matrix.org``
|
||||||
|
* ``matrix.org:8888``
|
||||||
|
* ``1.2.3.4`` (IPv4 literal)
|
||||||
|
* ``1.2.3.4:1234`` (IPv4 literal with explicit port)
|
||||||
|
* ``[1234:5678::abcd]`` (IPv6 literal)
|
||||||
|
* ``[1234:5678::abcd]:5678`` (IPv6 literal with explicit port)
|
||||||
|
|
||||||
|
|
||||||
|
Common Identifier Format
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The Matrix protocol uses a common format to assign unique identifiers to a
|
||||||
|
number of entities, including users, events and rooms. Each identifier takes
|
||||||
|
the form::
|
||||||
|
|
||||||
|
&localpart:domain
|
||||||
|
|
||||||
|
where ``&`` represents a 'sigil' character; ``domain`` is the `server name`_ of
|
||||||
|
the homeserver which allocated the identifier, and ``localpart`` is an
|
||||||
|
identifier allocated by that homeserver.
|
||||||
|
|
||||||
|
The sigil characters are as follows:
|
||||||
|
|
||||||
|
* ``@``: User ID
|
||||||
|
* ``!``: Room ID
|
||||||
|
* ``$``: Event ID
|
||||||
|
* ``#``: Room alias
|
||||||
|
|
||||||
|
The precise grammar defining the allowable format of an identifier depends on
|
||||||
|
the type of identifier.
|
||||||
|
|
||||||
|
User Identifiers
|
||||||
|
++++++++++++++++
|
||||||
|
|
||||||
|
Users within Matrix are uniquely identified by their Matrix user ID. The user
|
||||||
|
ID is namespaced to the homeserver which allocated the account and has the
|
||||||
|
form::
|
||||||
|
|
||||||
|
@localpart:domain
|
||||||
|
|
||||||
|
The ``localpart`` of a user ID is an opaque identifier for that user. It MUST
|
||||||
|
NOT be empty, and MUST contain only the characters ``a-z``, ``0-9``, ``.``,
|
||||||
|
``_``, ``=``, ``-``, and ``/``.
|
||||||
|
|
||||||
|
The ``domain`` of a user ID is the `server name`_ of the homeserver which
|
||||||
|
allocated the account.
|
||||||
|
|
||||||
|
The length of a user ID, including the ``@`` sigil and the domain, MUST NOT
|
||||||
|
exceed 255 characters.
|
||||||
|
|
||||||
|
The complete grammar for a legal user ID is::
|
||||||
|
|
||||||
|
user_id = "@" user_id_localpart ":" server_name
|
||||||
|
user_id_localpart = 1*user_id_char
|
||||||
|
user_id_char = DIGIT
|
||||||
|
/ %x61-7A ; a-z
|
||||||
|
/ "-" / "." / "=" / "_" / "/"
|
||||||
|
|
||||||
|
.. admonition:: Rationale
|
||||||
|
|
||||||
|
A number of factors were considered when defining the allowable characters
|
||||||
|
for a user ID.
|
||||||
|
|
||||||
|
Firstly, we chose to exclude characters outside the basic US-ASCII character
|
||||||
|
set. User IDs are primarily intended for use as an identifier at the protocol
|
||||||
|
level, and their use as a human-readable handle is of secondary
|
||||||
|
benefit. Furthermore, they are useful as a last-resort differentiator between
|
||||||
|
users with similar display names. Allowing the full unicode character set
|
||||||
|
would make very difficult for a human to distinguish two similar user IDs. The
|
||||||
|
limited character set used has the advantage that even a user unfamiliar with
|
||||||
|
the Latin alphabet should be able to distinguish similar user IDs manually, if
|
||||||
|
somewhat laboriously.
|
||||||
|
|
||||||
|
We chose to disallow upper-case characters because we do not consider it
|
||||||
|
valid to have two user IDs which differ only in case: indeed it should be
|
||||||
|
possible to reach ``@user:matrix.org`` as ``@USER:matrix.org``. However,
|
||||||
|
user IDs are necessarily used in a number of situations which are inherently
|
||||||
|
case-sensitive (notably in the ``state_key`` of ``m.room.member``
|
||||||
|
events). Forbidding upper-case characters (and requiring homeservers to
|
||||||
|
downcase usernames when creating user IDs for new users) is a relatively simple
|
||||||
|
way to ensure that ``@USER:matrix.org`` cannot refer to a different user to
|
||||||
|
``@user:matrix.org``.
|
||||||
|
|
||||||
|
Finally, we decided to restrict the allowable punctuation to a very basic set
|
||||||
|
to reduce the possibility of conflicts with special characters in various
|
||||||
|
situations. For example, "*" is used as a wildcard in some APIs (notably the
|
||||||
|
filter API), so it cannot be a legal user ID character.
|
||||||
|
|
||||||
|
The length restriction is derived from the limit on the length of the
|
||||||
|
``sender`` key on events; since the user ID appears in every event sent by the
|
||||||
|
user, it is limited to ensure that the user ID does not dominate over the actual
|
||||||
|
content of the events.
|
||||||
|
|
||||||
|
Matrix user IDs are sometimes informally referred to as MXIDs.
|
||||||
|
|
||||||
|
Historical User IDs
|
||||||
|
<<<<<<<<<<<<<<<<<<<
|
||||||
|
|
||||||
|
Older versions of this specification were more tolerant of the characters
|
||||||
|
permitted in user ID localparts. There are currently active users whose user
|
||||||
|
IDs do not conform to the permitted character set, and a number of rooms whose
|
||||||
|
history includes events with a ``sender`` which does not conform. In order to
|
||||||
|
handle these rooms successfully, clients and servers MUST accept user IDs with
|
||||||
|
localparts from the expanded character set::
|
||||||
|
|
||||||
|
extended_user_id_char = %x21-7E
|
||||||
|
|
||||||
|
Mapping from other character sets
|
||||||
|
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
||||||
|
|
||||||
|
In certain circumstances it will be desirable to map from a wider character set
|
||||||
|
onto the limited character set allowed in a user ID localpart. Examples include
|
||||||
|
a homeserver creating a user ID for a new user based on the username passed to
|
||||||
|
``/register``, or a bridge mapping user ids from another protocol.
|
||||||
|
|
||||||
|
.. TODO-spec
|
||||||
|
|
||||||
|
We need to better define the mechanism by which homeservers can allow users
|
||||||
|
to have non-Latin login credentials. The general idea is for clients to pass
|
||||||
|
the non-Latin in the ``username`` field to ``/register`` and ``/login``, and
|
||||||
|
the HS then maps it onto the MXID space when turning it into the
|
||||||
|
fully-qualified ``user_id`` which is returned to the client and used in
|
||||||
|
events.
|
||||||
|
|
||||||
|
Implementations are free to do this mapping however they choose. Since the user
|
||||||
|
ID is opaque except to the implementation which created it, the only
|
||||||
|
requirement is that the implemention can perform the mapping
|
||||||
|
consistently. However, we suggest the following algorithm:
|
||||||
|
|
||||||
|
1. Encode character strings as UTF-8.
|
||||||
|
|
||||||
|
2. Convert the bytes ``A-Z`` to lower-case.
|
||||||
|
|
||||||
|
* In the case where a bridge must be able to distinguish two different users
|
||||||
|
with ids which differ only by case, escape upper-case characters by
|
||||||
|
prefixing with ``_`` before downcasing. For example, ``A`` becomes
|
||||||
|
``_a``. Escape a real ``_`` with a second ``_``.
|
||||||
|
|
||||||
|
3. Encode any remaining bytes outside the allowed character set, as well as
|
||||||
|
``=``, as their hexadecimal value, prefixed with ``=``. For example, ``#``
|
||||||
|
becomes ``=23``; ``á`` becomes ``=c3=a1``.
|
||||||
|
|
||||||
|
.. admonition:: Rationale
|
||||||
|
|
||||||
|
The suggested mapping is an attempt to preserve human-readability of simple
|
||||||
|
ASCII identifiers (unlike, for example, base-32), whilst still allowing
|
||||||
|
representation of *any* character (unlike punycode, which provides no way to
|
||||||
|
encode ASCII punctuation).
|
||||||
|
|
||||||
|
|
||||||
|
Room IDs and Event IDs
|
||||||
|
++++++++++++++++++++++
|
||||||
|
|
||||||
|
A room has exactly one room ID. A room ID has the format::
|
||||||
|
|
||||||
|
!opaque_id:domain
|
||||||
|
|
||||||
|
An event has exactly one event ID. An event ID has the format::
|
||||||
|
|
||||||
|
$opaque_id:domain
|
||||||
|
|
||||||
|
The ``domain`` of a room/event ID is the `server name`_ of the homeserver which
|
||||||
|
created the room/event. The domain is used only for namespacing to avoid the
|
||||||
|
risk of clashes of identifiers between different homeservers. There is no
|
||||||
|
implication that the room or event in question is still available at the
|
||||||
|
corresponding homeserver.
|
||||||
|
|
||||||
|
Event IDs and Room IDs are case-sensitive. They are not meant to be human
|
||||||
|
readable.
|
||||||
|
|
||||||
|
.. TODO-spec
|
||||||
|
What is the grammar for the opaque part? https://matrix.org/jira/browse/SPEC-389
|
||||||
|
|
||||||
|
Room Aliases
|
||||||
|
++++++++++++
|
||||||
|
|
||||||
|
A room may have zero or more aliases. A room alias has the format::
|
||||||
|
|
||||||
|
#room_alias:domain
|
||||||
|
|
||||||
|
The ``domain`` of a room alias is the `server name`_ of the homeserver which
|
||||||
|
created the alias. Other servers may contact this homeserver to look up the
|
||||||
|
alias.
|
||||||
|
|
||||||
|
Room aliases MUST NOT exceed 255 bytes (including the ``#`` sigil and the
|
||||||
|
domain).
|
||||||
|
|
||||||
|
.. TODO-spec
|
||||||
|
- Need to specify precise grammar for Room Aliases. https://matrix.org/jira/browse/SPEC-391
|
||||||
|
|
@ -27,17 +27,17 @@ Voice over IP (VoIP) signalling, Internet of Things (IoT) communication, and bri
|
||||||
together existing communication silos - providing the basis of a new open real-time
|
together existing communication silos - providing the basis of a new open real-time
|
||||||
communication ecosystem.
|
communication ecosystem.
|
||||||
|
|
||||||
`Introduction to Matrix <intro.html>`_ provides a full introduction to Matrix and the spec.
|
|
||||||
|
|
||||||
Matrix APIs
|
Matrix APIs
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
The following APIs are documented in this specification:
|
The specification consists of the following parts:
|
||||||
|
|
||||||
|
`Introduction to Matrix <intro.html>`_ provides a full introduction to Matrix and the spec.
|
||||||
|
|
||||||
{{apis}}
|
{{apis}}
|
||||||
|
|
||||||
`Appendices <appendices.html>`_ with supplemental information not specific to
|
The `Appendices <appendices.html>`_ contain supplemental information not specific to
|
||||||
one of the above APIs are also available.
|
one of the above APIs.
|
||||||
|
|
||||||
Specification Versions
|
Specification Versions
|
||||||
----------------------
|
----------------------
|
||||||
|
|
|
||||||
|
|
@ -157,9 +157,8 @@ allocated the account and has the form::
|
||||||
|
|
||||||
@localpart:domain
|
@localpart:domain
|
||||||
|
|
||||||
See the `Identifier Grammar`_ section for full details of the structure of
|
See the `appendices <appendices.html#identifier-grammar>`_ for full details of
|
||||||
user IDs.
|
the structure of user IDs.
|
||||||
|
|
||||||
|
|
||||||
Devices
|
Devices
|
||||||
~~~~~~~
|
~~~~~~~
|
||||||
|
|
@ -242,8 +241,8 @@ There is exactly one room ID for each room. Whilst the room ID does contain a
|
||||||
domain, it is simply for globally namespacing room IDs. The room does NOT
|
domain, it is simply for globally namespacing room IDs. The room does NOT
|
||||||
reside on the domain specified.
|
reside on the domain specified.
|
||||||
|
|
||||||
See the `Identifier Grammar`_ section for full details of the structure of
|
See the `appendices <appendices.html#identifier-grammar>`_ for full details of
|
||||||
a room ID.
|
the structure of a room ID.
|
||||||
|
|
||||||
The following conceptual diagram shows an
|
The following conceptual diagram shows an
|
||||||
``m.room.message`` event being sent to the room ``!qporfwt:matrix.org``::
|
``m.room.message`` event being sent to the room ``!qporfwt:matrix.org``::
|
||||||
|
|
@ -318,8 +317,8 @@ Each room can also have multiple "Room Aliases", which look like::
|
||||||
|
|
||||||
#room_alias:domain
|
#room_alias:domain
|
||||||
|
|
||||||
See the `Identifier Grammar`_ section for full details of the structure of
|
See the `appendices <appendices.html#identifier-grammar>`_ for full details of
|
||||||
a room alias.
|
the structure of a room alias.
|
||||||
|
|
||||||
A room alias "points" to a room ID and is the human-readable label by which
|
A room alias "points" to a room ID and is the human-readable label by which
|
||||||
rooms are publicised and discovered. The room ID the alias is pointing to can
|
rooms are publicised and discovered. The room ID the alias is pointing to can
|
||||||
|
|
@ -387,221 +386,6 @@ dedicated API. The API is symmetrical to managing Profile data.
|
||||||
Would it really be overengineered to use the same API for both profile &
|
Would it really be overengineered to use the same API for both profile &
|
||||||
private user data, but with different ACLs?
|
private user data, but with different ACLs?
|
||||||
|
|
||||||
|
|
||||||
Identifier Grammar
|
|
||||||
------------------
|
|
||||||
|
|
||||||
Server Name
|
|
||||||
~~~~~~~~~~~
|
|
||||||
|
|
||||||
A homeserver is uniquely identified by its server name. This value is used in a
|
|
||||||
number of identifiers, as described below.
|
|
||||||
|
|
||||||
The server name represents the address at which the homeserver in question can
|
|
||||||
be reached by other homeservers. The complete grammar is::
|
|
||||||
|
|
||||||
server_name = dns_name [ ":" port]
|
|
||||||
dns_name = host
|
|
||||||
port = *DIGIT
|
|
||||||
|
|
||||||
where ``host`` is as defined by `RFC3986, section 3.2.2
|
|
||||||
<https://tools.ietf.org/html/rfc3986#section-3.2.2>`_.
|
|
||||||
|
|
||||||
Examples of valid server names are:
|
|
||||||
|
|
||||||
* ``matrix.org``
|
|
||||||
* ``matrix.org:8888``
|
|
||||||
* ``1.2.3.4`` (IPv4 literal)
|
|
||||||
* ``1.2.3.4:1234`` (IPv4 literal with explicit port)
|
|
||||||
* ``[1234:5678::abcd]`` (IPv6 literal)
|
|
||||||
* ``[1234:5678::abcd]:5678`` (IPv6 literal with explicit port)
|
|
||||||
|
|
||||||
|
|
||||||
Common Identifier Format
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The Matrix protocol uses a common format to assign unique identifiers to a
|
|
||||||
number of entities, including users, events and rooms. Each identifier takes
|
|
||||||
the form::
|
|
||||||
|
|
||||||
&localpart:domain
|
|
||||||
|
|
||||||
where ``&`` represents a 'sigil' character; ``domain`` is the `server name`_ of
|
|
||||||
the homeserver which allocated the identifier, and ``localpart`` is an
|
|
||||||
identifier allocated by that homeserver.
|
|
||||||
|
|
||||||
The sigil characters are as follows:
|
|
||||||
|
|
||||||
* ``@``: User ID
|
|
||||||
* ``!``: Room ID
|
|
||||||
* ``$``: Event ID
|
|
||||||
* ``#``: Room alias
|
|
||||||
|
|
||||||
The precise grammar defining the allowable format of an identifier depends on
|
|
||||||
the type of identifier.
|
|
||||||
|
|
||||||
User Identifiers
|
|
||||||
++++++++++++++++
|
|
||||||
|
|
||||||
Users within Matrix are uniquely identified by their Matrix user ID. The user
|
|
||||||
ID is namespaced to the homeserver which allocated the account and has the
|
|
||||||
form::
|
|
||||||
|
|
||||||
@localpart:domain
|
|
||||||
|
|
||||||
The ``localpart`` of a user ID is an opaque identifier for that user. It MUST
|
|
||||||
NOT be empty, and MUST contain only the characters ``a-z``, ``0-9``, ``.``,
|
|
||||||
``_``, ``=``, and ``-``.
|
|
||||||
|
|
||||||
The ``domain`` of a user ID is the `server name`_ of the homeserver which
|
|
||||||
allocated the account.
|
|
||||||
|
|
||||||
The length of a user ID, including the ``@`` sigil and the domain, MUST NOT
|
|
||||||
exceed 255 characters.
|
|
||||||
|
|
||||||
The complete grammar for a legal user ID is::
|
|
||||||
|
|
||||||
user_id = "@" user_id_localpart ":" server_name
|
|
||||||
user_id_localpart = 1*user_id_char
|
|
||||||
user_id_char = DIGIT
|
|
||||||
/ %x61-7A ; a-z
|
|
||||||
/ "-" / "." / "=" / "_"
|
|
||||||
|
|
||||||
.. admonition:: Rationale
|
|
||||||
|
|
||||||
A number of factors were considered when defining the allowable characters
|
|
||||||
for a user ID.
|
|
||||||
|
|
||||||
Firstly, we chose to exclude characters outside the basic US-ASCII character
|
|
||||||
set. User IDs are primarily intended for use as an identifier at the protocol
|
|
||||||
level, and their use as a human-readable handle is of secondary
|
|
||||||
benefit. Furthermore, they are useful as a last-resort differentiator between
|
|
||||||
users with similar display names. Allowing the full unicode character set
|
|
||||||
would make very difficult for a human to distinguish two similar user IDs. The
|
|
||||||
limited character set used has the advantage that even a user unfamiliar with
|
|
||||||
the Latin alphabet should be able to distinguish similar user IDs manually, if
|
|
||||||
somewhat laboriously.
|
|
||||||
|
|
||||||
We chose to disallow upper-case characters because we do not consider it
|
|
||||||
valid to have two user IDs which differ only in case: indeed it should be
|
|
||||||
possible to reach ``@user:matrix.org`` as ``@USER:matrix.org``. However,
|
|
||||||
user IDs are necessarily used in a number of situations which are inherently
|
|
||||||
case-sensitive (notably in the ``state_key`` of ``m.room.member``
|
|
||||||
events). Forbidding upper-case characters (and requiring homeservers to
|
|
||||||
downcase usernames when creating user IDs for new users) is a relatively simple
|
|
||||||
way to ensure that ``@USER:matrix.org`` cannot refer to a different user to
|
|
||||||
``@user:matrix.org``.
|
|
||||||
|
|
||||||
Finally, we decided to restrict the allowable punctuation to a very basic set
|
|
||||||
to ensure that the identifier can be used as-is in as wide a number of
|
|
||||||
situations as possible, without requiring escaping. For instance, allowing
|
|
||||||
"%" or "/" would make it harder to use a user ID in a URI. "*" is used as a
|
|
||||||
wildcard in some APIs (notably the filter API), so it also cannot be a legal
|
|
||||||
user ID character.
|
|
||||||
|
|
||||||
The length restriction is derived from the limit on the length of the
|
|
||||||
``sender`` key on events; since the user ID appears in every event sent by the
|
|
||||||
user, it is limited to ensure that the user ID does not dominate over the actual
|
|
||||||
content of the events.
|
|
||||||
|
|
||||||
Matrix user IDs are sometimes informally referred to as MXIDs.
|
|
||||||
|
|
||||||
Historical User IDs
|
|
||||||
<<<<<<<<<<<<<<<<<<<
|
|
||||||
|
|
||||||
Older versions of this specification were more tolerant of the characters
|
|
||||||
permitted in user ID localparts. There are currently active users whose user
|
|
||||||
IDs do not conform to the permitted character set, and a number of rooms whose
|
|
||||||
history includes events with a ``sender`` which does not conform. In order to
|
|
||||||
handle these rooms successfully, clients and servers MUST accept user IDs with
|
|
||||||
localparts from the expanded character set::
|
|
||||||
|
|
||||||
extended_user_id_char = %x21-7E
|
|
||||||
|
|
||||||
Mapping from other character sets
|
|
||||||
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
|
||||||
|
|
||||||
In certain circumstances it will be desirable to map from a wider character set
|
|
||||||
onto the limited character set allowed in a user ID localpart. Examples include
|
|
||||||
a homeserver creating a user ID for a new user based on the username passed to
|
|
||||||
``/register``, or a bridge mapping user ids from another protocol.
|
|
||||||
|
|
||||||
.. TODO-spec
|
|
||||||
|
|
||||||
We need to better define the mechanism by which homeservers can allow users
|
|
||||||
to have non-Latin login credentials. The general idea is for clients to pass
|
|
||||||
the non-Latin in the ``username`` field to ``/register`` and ``/login``, and
|
|
||||||
the HS then maps it onto the MXID space when turning it into the
|
|
||||||
fully-qualified ``user_id`` which is returned to the client and used in
|
|
||||||
events.
|
|
||||||
|
|
||||||
Implementations are free to do this mapping however they choose. Since the user
|
|
||||||
ID is opaque except to the implementation which created it, the only
|
|
||||||
requirement is that the implemention can perform the mapping
|
|
||||||
consistently. However, we suggest the following algorithm:
|
|
||||||
|
|
||||||
1. Encode character strings as UTF-8.
|
|
||||||
|
|
||||||
2. Convert the bytes ``A-Z`` to lower-case.
|
|
||||||
|
|
||||||
* In the case where a bridge must be able to distinguish two different users
|
|
||||||
with ids which differ only by case, escape upper-case characters by
|
|
||||||
prefixing with ``_`` before downcasing. For example, ``A`` becomes
|
|
||||||
``_a``. Escape a real ``_`` with a second ``_``.
|
|
||||||
|
|
||||||
3. Encode any remaining bytes outside the allowed character set, as well as
|
|
||||||
``=``, as their hexadecimal value, prefixed with ``=``. For example, ``#``
|
|
||||||
becomes ``=23``; ``á`` becomes ``=c3=a1``.
|
|
||||||
|
|
||||||
.. admonition:: Rationale
|
|
||||||
|
|
||||||
The suggested mapping is an attempt to preserve human-readability of simple
|
|
||||||
ASCII identifiers (unlike, for example, base-32), whilst still allowing
|
|
||||||
representation of *any* character (unlike punycode, which provides no way to
|
|
||||||
encode ASCII punctuation).
|
|
||||||
|
|
||||||
|
|
||||||
Room IDs and Event IDs
|
|
||||||
++++++++++++++++++++++
|
|
||||||
|
|
||||||
A room has exactly one room ID. A room ID has the format::
|
|
||||||
|
|
||||||
!opaque_id:domain
|
|
||||||
|
|
||||||
An event has exactly one event ID. An event ID has the format::
|
|
||||||
|
|
||||||
$opaque_id:domain
|
|
||||||
|
|
||||||
The ``domain`` of a room/event ID is the `server name`_ of the homeserver which
|
|
||||||
created the room/event. The domain is used only for namespacing to avoid the
|
|
||||||
risk of clashes of identifiers between different homeservers. There is no
|
|
||||||
implication that the room or event in question is still available at the
|
|
||||||
corresponding homeserver.
|
|
||||||
|
|
||||||
Event IDs and Room IDs are case-sensitive. They are not meant to be human
|
|
||||||
readable.
|
|
||||||
|
|
||||||
.. TODO-spec
|
|
||||||
What is the grammar for the opaque part? https://matrix.org/jira/browse/SPEC-389
|
|
||||||
|
|
||||||
Room Aliases
|
|
||||||
++++++++++++
|
|
||||||
|
|
||||||
A room may have zero or more aliases. A room alias has the format::
|
|
||||||
|
|
||||||
#room_alias:domain
|
|
||||||
|
|
||||||
The ``domain`` of a room alias is the `server name`_ of the homeserver which
|
|
||||||
created the alias. Other servers may contact this homeserver to look up the
|
|
||||||
alias.
|
|
||||||
|
|
||||||
Room aliases MUST NOT exceed 255 bytes (including the ``#`` sigil and the
|
|
||||||
domain).
|
|
||||||
|
|
||||||
.. TODO-spec
|
|
||||||
- Need to specify precise grammar for Room Aliases. https://matrix.org/jira/browse/SPEC-391
|
|
||||||
|
|
||||||
|
|
||||||
License
|
License
|
||||||
-------
|
-------
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -34,6 +34,7 @@ targets:
|
||||||
- appendices.rst
|
- appendices.rst
|
||||||
- appendices/base64.rst
|
- appendices/base64.rst
|
||||||
- appendices/signing_json.rst
|
- appendices/signing_json.rst
|
||||||
|
- appendices/identifier_grammar.rst
|
||||||
- appendices/threat_model.rst
|
- appendices/threat_model.rst
|
||||||
- appendices/test_vectors.rst
|
- appendices/test_vectors.rst
|
||||||
groups: # reusable blobs of files when prefixed with 'group:'
|
groups: # reusable blobs of files when prefixed with 'group:'
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue