mirror of
https://github.com/matrix-org/matrix-spec
synced 2026-04-17 00:24:10 +02:00
update MSC1884 to reflect new conclusions following discussion on the PR
This commit is contained in:
parent
82258fc0fc
commit
207d6cf851
|
|
@ -11,12 +11,20 @@ solution is to ensure that event IDs are URL-encoded, so that `/` is instead
|
||||||
represented as `%2F`. However, this is not entirely satisfactory for a number
|
represented as `%2F`. However, this is not entirely satisfactory for a number
|
||||||
of reasons:
|
of reasons:
|
||||||
|
|
||||||
|
* The act of escaping and unescaping slash characters when doing casual
|
||||||
|
development and ops work becomes an constant and annoying chore which
|
||||||
|
is entirely avoidable. Whenever using tools like `curl` and `grep` or
|
||||||
|
manipulating SQL, developers will have to constantly keep in mind whether
|
||||||
|
they are dealing with escaped or unescaped IDs, and manually convert between
|
||||||
|
the two as needed. This will only get worse with further keys-as-IDs
|
||||||
|
landing with MSC1228.
|
||||||
|
|
||||||
* There exist a number of client (and possibly server) implementations which
|
* There exist a number of client (and possibly server) implementations which
|
||||||
do not currently URL-encode such parameters; these are therefore broken by
|
do not currently URL-encode such parameters; these are therefore broken by
|
||||||
such event IDs and must be updated. Furthermore, all future client
|
such event IDs and must be updated. Furthermore, all future client
|
||||||
implementers must remember to do the encoding correctly.
|
implementers must remember to do the encoding correctly.
|
||||||
|
|
||||||
* Even if client implementations do rembember to URL-encode their parameters,
|
* Even if client implementations do remember to URL-encode their parameters,
|
||||||
they may not do it correctly: many URL-encoding implementations may be
|
they may not do it correctly: many URL-encoding implementations may be
|
||||||
intended to encode parameters in the query-string (which can of course
|
intended to encode parameters in the query-string (which can of course
|
||||||
contain literal slashes) rather tha the path component.
|
contain literal slashes) rather tha the path component.
|
||||||
|
|
@ -27,6 +35,14 @@ of reasons:
|
||||||
existing setups will be broken by this change, and it is a trap for new
|
existing setups will be broken by this change, and it is a trap for new
|
||||||
users of the software.
|
users of the software.
|
||||||
|
|
||||||
|
* Cosmetically, URL-escaping base64 in otherwise-constant-length IDs results
|
||||||
|
in variable length IDs, making it harder to visually scan lists of IDs and
|
||||||
|
manipulate them in columnar form when doing devops work.
|
||||||
|
|
||||||
|
* Those developing against the CS API might reasonably expect us to use
|
||||||
|
URL-safe identifiers in URLs where available, rather than deliberately
|
||||||
|
choosing non-URL-safe IDs, which could be seen as developer-unfriendly.
|
||||||
|
|
||||||
## Proposal
|
## Proposal
|
||||||
|
|
||||||
This MSC proposes that we should introduce a new room version, in which event
|
This MSC proposes that we should introduce a new room version, in which event
|
||||||
|
|
@ -34,6 +50,22 @@ IDs are encoded using the [URL-safe Base64
|
||||||
encoding](https://tools.ietf.org/html/rfc4648#section-5) (which uses `-` and
|
encoding](https://tools.ietf.org/html/rfc4648#section-5) (which uses `-` and
|
||||||
`_` as the 62nd and 63rd characters instead of `+` and `/`).
|
`_` as the 62nd and 63rd characters instead of `+` and `/`).
|
||||||
|
|
||||||
|
URL-safe Base64 encoding is then used consistently for encoding binary
|
||||||
|
identifiers in the CS API - particularly in upcoming MSC1228 IDs for rooms and
|
||||||
|
users, such that typical CS API developers should be able to safely assume
|
||||||
|
that for all common cases they should use URL-safe Base64 when decoding base64
|
||||||
|
strings.
|
||||||
|
|
||||||
|
The exception would be for E2EE data (device keys and signatures etc) which
|
||||||
|
currently use normal Base64 with no easy mechanism to migrate to a new encoding.
|
||||||
|
Given E2EE development is rare and requires expert skills, it seems acceptable
|
||||||
|
to expect E2EE developers to be able to use the right encoding without tripping
|
||||||
|
up significantly.
|
||||||
|
|
||||||
|
Similarly, the S2S API could continue to use standard base64-encoded hashes and
|
||||||
|
signatures, given they are only exposed to S2S API developers who are necessarily
|
||||||
|
expert and should be able to correctly pick the right encoding.
|
||||||
|
|
||||||
## Counterarguments
|
## Counterarguments
|
||||||
|
|
||||||
1. Inconsistency. Base64 encoding is used heavily elsewhere in the matrix
|
1. Inconsistency. Base64 encoding is used heavily elsewhere in the matrix
|
||||||
|
|
@ -45,6 +77,14 @@ encoding](https://tools.ietf.org/html/rfc4648#section-5) (which uses `-` and
|
||||||
Changing event IDs alone would therefore leave us with a confusing mix of
|
Changing event IDs alone would therefore leave us with a confusing mix of
|
||||||
encodings.
|
encodings.
|
||||||
|
|
||||||
|
However, the current uses of standard Base64 encodings are not exposed to
|
||||||
|
common CS API developers, and so whilst this might be slightly confusing
|
||||||
|
for the minority of expert homeserver developers, the confusion does not
|
||||||
|
exist today for client developers. Therefore it seems safe to standardise
|
||||||
|
on URL-safe Base64 for identifiers exposed to the client developers, who
|
||||||
|
form by far the majority of the Matrix ecosystem today, and expect as
|
||||||
|
simple an API as possible.
|
||||||
|
|
||||||
A potential extension would be to change *all* Base64 encodings to be
|
A potential extension would be to change *all* Base64 encodings to be
|
||||||
URL-safe. This would address the inconsistency. However, it feels like a
|
URL-safe. This would address the inconsistency. However, it feels like a
|
||||||
large job which would span the entire matrix ecosystem (far larger than
|
large job which would span the entire matrix ecosystem (far larger than
|
||||||
|
|
@ -70,6 +110,16 @@ encoding](https://tools.ietf.org/html/rfc4648#section-5) (which uses `-` and
|
||||||
Of course, an alternative is to modify the grammars of all of these
|
Of course, an alternative is to modify the grammars of all of these
|
||||||
identifiers to forbid slashes.
|
identifiers to forbid slashes.
|
||||||
|
|
||||||
|
The counter-counterargument to this is that it is of course best practice
|
||||||
|
for implementations is to URL-escape any IDs used in URLs, and human-selected
|
||||||
|
IDs such as Room aliases, Group IDs, Matrix user IDs etc apply an adequate
|
||||||
|
forcing function already to remind developers to do this. However,
|
||||||
|
it doesn't follow that we should then also deliberately pick URL-unsafe
|
||||||
|
encodings for machine-selected IDs - the argument that it is better for software
|
||||||
|
to fail 50% of the time to force a fix is irrelevant when the possibility
|
||||||
|
exists for the software to fail 0% of the time in the first place by picking
|
||||||
|
an identifier format which cannot fail.
|
||||||
|
|
||||||
[1] Discussion remains open as to whether allowing slashes in User IDs was a
|
[1] Discussion remains open as to whether allowing slashes in User IDs was a
|
||||||
good idea.
|
good idea.
|
||||||
|
|
||||||
|
|
@ -87,5 +137,29 @@ solutions to that are also possible).
|
||||||
|
|
||||||
## Conclusion
|
## Conclusion
|
||||||
|
|
||||||
It's unclear to me that changing the format of event IDs alone solves any
|
There are two main questions here:
|
||||||
problems.
|
|
||||||
|
1. Whether it's worth forcing casual CS API developers to juggle escaping of
|
||||||
|
machine-selected IDs in order to remind them to escape all variables in
|
||||||
|
their URIs correctly.
|
||||||
|
|
||||||
|
2. Whether it's a significant problem for E2EE & SS API developers to have to
|
||||||
|
handle strings which are a mix of standard Base64 and URL-safe Base64
|
||||||
|
encodings.
|
||||||
|
|
||||||
|
Both of these are a subjective judgement call.
|
||||||
|
|
||||||
|
Given we wish the CS API particularly to be as easy for casual developers to
|
||||||
|
use as possible, it feels that we should find another way to encourage
|
||||||
|
developers to escape variables in their URLs in general - e.g. by recommending
|
||||||
|
that developers test their clients against a 'torture room' full of exotic IDs
|
||||||
|
and data, or by improving warnings in the spec... rather than (ab)using
|
||||||
|
machine-selected IDs as a reminder.
|
||||||
|
|
||||||
|
Meanwhile, given we have many more CS API developers than SS or E2EE developers,
|
||||||
|
and we wish to make the CS API particularly easy for casual developers to use,
|
||||||
|
it feels we should not prioritise consistency of encodings for SS/E2EE developers
|
||||||
|
over the usability of the CS API.
|
||||||
|
|
||||||
|
Therefore, on balance, it seems plausible that changing the format of event IDs
|
||||||
|
does solve sufficient problems to make it desirable.
|
||||||
Loading…
Reference in a new issue